[9fans] fossil+venti performance question

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] fossil+venti performance question
@ 2015-05-04  9:32 KADOTA Kyohei
  2015-05-04 16:10 ` Anthony Sorace
  0 siblings, 1 reply; 52+ messages in thread
From: KADOTA Kyohei @ 2015-05-04  9:32 UTC (permalink / raw)
  To: 9fans

Hello, fans.

I’m running Plan 9(labs) on public QEMU/KVM service.
My Plan 9 system has a slow read performance problem.
I ran 'iostats md5sum /386/9pcf’, DMA is on, read result is 150KB/s.
but write performance is fast.

My Plan 9 system has a 200GB HDD, formatted with fossil+venti.
disk layout is:

- 9fat	100MB
- nvram	512B
- fossil	31.82GB
- arenas	159.11GB
- isect	7.95GB
- bloom	512MB
- swap	512MB

Also, I explained other installations.

1)200GB HDD with fossil only.
2)100GB HDD with fossil+venti.

Read performance is fast (about 15MB/s) both installations.

Could you tell me the reason?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-04  9:32 [9fans] fossil+venti performance question KADOTA Kyohei
@ 2015-05-04 16:10 ` Anthony Sorace
  2015-05-04 18:11   ` Aram Hăvărneanu
  2015-05-05 14:47   ` KADOTA Kyohei
  0 siblings, 2 replies; 52+ messages in thread
From: Anthony Sorace @ 2015-05-04 16:10 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

The reason, in general:
In a fossil+venti setup, fossil runs (basically) as a
cache for venti. If your access just hits fossil, it’ll
be quick; if not, you hit the (significantly slower)
venti. I bet if you re-run the same test twice in a
row, you’re going to see dramatically improved
performance. Try it. If that’s true, the question is
really one of venti performance; if not, you may
have another system config issue.

There are various changes you can make to how
venti uses disk/memory that can speed things up,
but I don’t have a good handle on which to
suggest first.

Your write performance in that test isn’t really
relevant: they’re not hitting the file system at all.

I’m not sure why you’d see a difference in a
fossil+venti setup of a different size, but the
partition size relationships, and the in-memory
cache size relationships, are what’s mostly important.

a

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-04 16:10 ` Anthony Sorace
@ 2015-05-04 18:11   ` Aram Hăvărneanu
  2015-05-04 18:51     ` David du Colombier
  2015-05-05 15:07     ` KADOTA Kyohei
  2015-05-05 14:47   ` KADOTA Kyohei
  1 sibling, 2 replies; 52+ messages in thread
From: Aram Hăvărneanu @ 2015-05-04 18:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I have seen the same problem a few years back on about half of my
machines. The other half were fine. There was a 1000x difference in
performance between the good and bad machines. I have spent some time
debugging this, but unfortunately, I couldn't find the root cause, and
I just stopped using fossil.

It only happens when fossil is used with Plan 9 venti, it does not
happen when fossil is used by itself, and it does not happen when
fossil is used with plan9port venti.

In all these scenarios, the data is present in fossil, it does not
need to be fetched from venti, so the venti performance is not the
issue. The problem is that the mere presence of Plan 9 venti induces
this problem somewhere else (fossil or the kernel).

-- 
Aram Hăvărneanu

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-04 18:11   ` Aram Hăvărneanu
@ 2015-05-04 18:51     ` David du Colombier
  2015-05-05 14:29       ` Sergey Zhilkin
  2015-05-05 15:05       ` Charles Forsyth
  2015-05-05 15:07     ` KADOTA Kyohei
  1 sibling, 2 replies; 52+ messages in thread
From: David du Colombier @ 2015-05-04 18:51 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I'm experiencing the same issue as well.

When I launch vacfs on the same machine as Venti,
reading is very slow. When I launch vacfs on another
Plan 9 or Unix machine, reading is fast.

I've just made some measurements when reading a file:

Vacfs running on the same machine as Venti: 151 KB/s
Vacfs running on another machine: 5131 KB/s

--
David du Colombier



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-04 18:51     ` David du Colombier
@ 2015-05-05 14:29       ` Sergey Zhilkin
  2015-05-05 15:05       ` Charles Forsyth
  1 sibling, 0 replies; 52+ messages in thread
From: Sergey Zhilkin @ 2015-05-05 14:29 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 1353 bytes --]

Hello!

imho placing fossil, venti, isect, bloom and swap on single drive is bad
idea.
As written in in http://plan9.bell-labs.com/sys/doc/venti/venti.html - "The
prototype Venti server is implemented for the Plan 9 operating system in
about 10,000 lines of C. The server runs on a dedicated dual 550Mhz Pentium
III processor system with 2 Gbyte of memory and is accessed over a 100Mbs
Ethernet network. The data log is stored on a 500 Gbyte MaxTronic IDE Raid
5 Array and the index resides on a string of 8 Seagate Cheetah 18XL 9 Gbyte
SCSI drives."

God ide is to store isect on multiple SSD drives :) to speed up search.

My small installation - 80Gb (PATA) 9fat, fossil, swap + 40Gb isect, bloom
drive (PATA) + 1Tb SATA as arenas. No RAID.

2015-05-04 21:51 GMT+03:00 David du Colombier <0intro@gmail.com>:

> I'm experiencing the same issue as well.
>
> When I launch vacfs on the same machine as Venti,
> reading is very slow. When I launch vacfs on another
> Plan 9 or Unix machine, reading is fast.
>
> I've just made some measurements when reading a file:
>
> Vacfs running on the same machine as Venti: 151 KB/s
> Vacfs running on another machine: 5131 KB/s
>
> --
> David du Colombier
>
>

-- 
С наилучшими пожеланиями
Жилкин Сергей
With best regards
Zhilkin Sergey

[-- Attachment #2: Type: text/html, Size: 2324 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-04 16:10 ` Anthony Sorace
  2015-05-04 18:11   ` Aram Hăvărneanu
@ 2015-05-05 14:47   ` KADOTA Kyohei
  2015-05-05 15:46     ` steve
  1 sibling, 1 reply; 52+ messages in thread
From: KADOTA Kyohei @ 2015-05-05 14:47 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Thanks Anthony.

> I bet if you re-run the same test twice in a
> row, you’re going to see dramatically improved
> performance.

I try to re-run ‘iostats md5sum /386/9pcf’.
Read result is very fast.

first read result is 152KB/s.
second read result is 232MB/s.

> Your write performance in that test isn’t really
> relevant: they’re not hitting the file system at all.

I think to write 1GB data to filesystem:

	iostats dd -if /dev/zero -of output -ibs 1024k -obs 1024k -count 1024

Write result of dd is 31MB/s.
But this test may just write to fossil. It may not write to venti.

> I’m not sure why you’d see a difference in a
> fossil+venti setup of a different size, but the
> partition size relationships, and the in-memory
> cache size relationships, are what’s mostly important.

My hardware has 2GB memory.
Plan 9 configurations are almost default. (except /dev/sdC0/bloom)
To increase memory size is difficult,
because memory size is determined by public QEMU/KVM service plan.

—
kadota

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-04 18:51     ` David du Colombier
  2015-05-05 14:29       ` Sergey Zhilkin
@ 2015-05-05 15:05       ` Charles Forsyth
  2015-05-05 15:38         ` David du Colombier
  1 sibling, 1 reply; 52+ messages in thread
From: Charles Forsyth @ 2015-05-05 15:05 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 290 bytes --]

On 4 May 2015 at 19:51, David du Colombier <0intro@gmail.com> wrote:

>
> I've just made some measurements when reading a file:
>
> Vacfs running on the same machine as Venti: 151 KB/s
> Vacfs running on another machine: 5131 KB/s


How many times do you time it on each machine?

[-- Attachment #2: Type: text/html, Size: 585 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-04 18:11   ` Aram Hăvărneanu
  2015-05-04 18:51     ` David du Colombier
@ 2015-05-05 15:07     ` KADOTA Kyohei
  1 sibling, 0 replies; 52+ messages in thread
From: KADOTA Kyohei @ 2015-05-05 15:07 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Thanks Aram.

> I have spent some time
> debugging this, but unfortunately, I couldn't find the root cause, and
> I just stopped using fossil.

I tried to measure performance effect by replacement of component.

1) mbr or GRUB
2) pbs or pbslba
3) sdata or sdvirtio (sdvirtio is imported from 9legacy)
4) kernel configurations (9pcf, 9pccpuf, 9pcauth, etc)

unfortunately, all of the above are no performance effect.

—
kadota


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-05 15:05       ` Charles Forsyth
@ 2015-05-05 15:38         ` David du Colombier
  2015-05-05 22:23           ` Charles Forsyth
  0 siblings, 1 reply; 52+ messages in thread
From: David du Colombier @ 2015-05-05 15:38 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

>> I've just made some measurements when reading a file:
>>
>> Vacfs running on the same machine as Venti: 151 KB/s
>> Vacfs running on another machine: 5131 KB/s
>
>
> How many times do you time it on each machine?

Maybe ten times. The results are always the same ~5%.
Also, I restarted vacfs between each try.

It's easy to reproduce this issue with vacfs. I think
anyone running Venti on Plan 9 can observe this problem.

--
David du Colombier



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-05 14:47   ` KADOTA Kyohei
@ 2015-05-05 15:46     ` steve
  2015-05-05 15:54       ` David du Colombier
  0 siblings, 1 reply; 52+ messages in thread
From: steve @ 2015-05-05 15:46 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I too see this, and feel, no proof, that things used to be better. I.e. the first time I read a file from venti it it very, very slow. subsequent reads from the ram cache are quick.

I think venti used to be faster a few years ago. maybe another effect of this is the boot time seems slower than it used to be.

sorry to be vague.

-Steve





> On 5 May 2015, at 15:47, KADOTA Kyohei <lufia@me.com> wrote:
> 
> Thanks Anthony.
> 
>> I bet if you re-run the same test twice in a
>> row, you’re going to see dramatically improved
>> performance.
> 
> I try to re-run ‘iostats md5sum /386/9pcf’.
> Read result is very fast.
> 
> first read result is 152KB/s.
> second read result is 232MB/s.
> 
>> Your write performance in that test isn’t really
>> relevant: they’re not hitting the file system at all.
> 
> I think to write 1GB data to filesystem:
> 
>    iostats dd -if /dev/zero -of output -ibs 1024k -obs 1024k -count 1024
> 
> Write result of dd is 31MB/s.
> But this test may just write to fossil. It may not write to venti.
> 
>> I’m not sure why you’d see a difference in a
>> fossil+venti setup of a different size, but the
>> partition size relationships, and the in-memory
>> cache size relationships, are what’s mostly important.
> 
> My hardware has 2GB memory.
> Plan 9 configurations are almost default. (except /dev/sdC0/bloom)
> To increase memory size is difficult,
> because memory size is determined by public QEMU/KVM service plan.
> 
> —
> kadota



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-05 15:46     ` steve
@ 2015-05-05 15:54       ` David du Colombier
  0 siblings, 0 replies; 52+ messages in thread
From: David du Colombier @ 2015-05-05 15:54 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> I too see this, and feel, no proof, that things used to be better. I.e. the first time I read a file from venti it it very, very slow. subsequent reads from the ram cache are quick.
>
> I think venti used to be faster a few years ago. maybe another effect of this is the boot time seems slower than it used to be.
>
> sorry to be vague.

I'm pretty sure this issue started something like two years ago.
It looks like a regression somewhere.

--
David du Colombier



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-05 15:38         ` David du Colombier
@ 2015-05-05 22:23           ` Charles Forsyth
  2015-05-05 22:29             ` cinap_lenrek
  2015-05-05 22:33             ` David du Colombier
  0 siblings, 2 replies; 52+ messages in thread
From: Charles Forsyth @ 2015-05-05 22:23 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 1479 bytes --]

On 5 May 2015 at 16:38, David du Colombier <0intro@gmail.com> wrote:

> > How many times do you time it on each machine?
>
> Maybe ten times. The results are always the same ~5%.
> Also, I restarted vacfs between each try.

It was the effect of the ram caches that prompted the question.

My experience is similar to Steve's: it was faster, and now it's initially
very very slow.
I looked at changes from that version of venti to this, and I didn't see
anything that would cause that.
(The problem could be outside venti, but I looked at some possibly relevant
kernel changes too.)

Note that the raw drive speed on my venti machine is fine (no doubt it
could be better, but it's fine).
I convinced myself through experiments that the problem was with venti, not
fossil.
I used some debugging code in venti and had the impression that it took a
surprisingly long time
to handle each request: that the time was in venti. The effect was similar
to that of a lost
interrupt for a device driver. I used ratrace on it, but didn't spot an
obvious culprit.
I was tempted to rip out or disable the drive scheduling code in venti to
see what happened, but
not for the first time I ran out of time and had to get back to some other
work.

One thing I didn't know was that the results were different when fossil was
on a different machine.
I thought I'd tried that with vacfs myself, but apparently not or mine was
as slow as when on the same
machine.

[-- Attachment #2: Type: text/html, Size: 2284 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-05 22:23           ` Charles Forsyth
@ 2015-05-05 22:29             ` cinap_lenrek
  2015-05-05 22:33             ` David du Colombier
  1 sibling, 0 replies; 52+ messages in thread
From: cinap_lenrek @ 2015-05-05 22:29 UTC (permalink / raw)
  To: 9fans

semlocks?

anyway, should not be too hard to figure out with /n/dump

--
cinap



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-05 22:23           ` Charles Forsyth
  2015-05-05 22:29             ` cinap_lenrek
@ 2015-05-05 22:33             ` David du Colombier
  2015-05-05 22:53               ` Aram Hăvărneanu
  1 sibling, 1 reply; 52+ messages in thread
From: David du Colombier @ 2015-05-05 22:33 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Yes, I'm pretty sure it's not related to Fossil, since it happens with
vacfs as well.
Also, Venti was pretty much unchanged during the last few years.

I suspected it was related to the lock change on 2013-09-19.

https://github.com/0intro/plan9/commit/c4d045a91e

But I remember I tried to revert this change and the problem
was still present. Maybe I should try again to be sure.

--
David du Colombier

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-05 22:33             ` David du Colombier
@ 2015-05-05 22:53               ` Aram Hăvărneanu
  2015-05-06 20:55                 ` David du Colombier
  2015-05-07  3:43                 ` erik quanstrom
  0 siblings, 2 replies; 52+ messages in thread
From: Aram Hăvărneanu @ 2015-05-05 22:53 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

It's pretty interesting that at least three people all got exactly
150kB/s on vastly different machines, both real and virtual. Maybe the
number comes from some tick frequency?

-- 
Aram Hăvărneanu



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-05 22:53               ` Aram Hăvărneanu
@ 2015-05-06 20:55                 ` David du Colombier
  2015-05-06 21:17                   ` Charles Forsyth
  2015-05-07  3:43                 ` erik quanstrom
  1 sibling, 1 reply; 52+ messages in thread
From: David du Colombier @ 2015-05-06 20:55 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Just to be sure, I tried again, and the issue is not related
to the lock change on 2013-09-19.

However, now I'm sure the issue was caused by a kernel
change in 2013.

There is no problem when running a kernel from early 2013.

--
David du Colombier

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-06 20:55                 ` David du Colombier
@ 2015-05-06 21:17                   ` Charles Forsyth
  2015-05-06 21:26                     ` David du Colombier
  0 siblings, 1 reply; 52+ messages in thread
From: Charles Forsyth @ 2015-05-06 21:17 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 268 bytes --]

On 6 May 2015 at 21:55, David du Colombier <0intro@gmail.com> wrote:

> However, now I'm sure the issue was caused by a kernel
> change in 2013.
>
> There is no problem when running a kernel from early 2013.
>

Welly, welly, welly, well. That is interesting.

[-- Attachment #2: Type: text/html, Size: 855 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-06 21:17                   ` Charles Forsyth
@ 2015-05-06 21:26                     ` David du Colombier
  2015-05-06 21:28                       ` David du Colombier
  2015-05-07  3:38                       ` erik quanstrom
  0 siblings, 2 replies; 52+ messages in thread
From: David du Colombier @ 2015-05-06 21:26 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I got it!

The regression was caused by the NewReno TCP
change on 2013-01-24.

https://github.com/0intro/plan9/commit/e8406a2f44

--
David du Colombier



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-06 21:26                     ` David du Colombier
@ 2015-05-06 21:28                       ` David du Colombier
  2015-05-06 22:28                         ` Charles Forsyth
  2015-05-06 22:35                         ` Steven Stallion
  2015-05-07  3:38                       ` erik quanstrom
  1 sibling, 2 replies; 52+ messages in thread
From: David du Colombier @ 2015-05-06 21:28 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Since the problem only happen when Fossil or vacfs are running
on the same machine as Venti, I suppose this is somewhat related
to how TCP behaves with the loopback.

--
David du Colombier



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-06 21:28                       ` David du Colombier
@ 2015-05-06 22:28                         ` Charles Forsyth
  2015-05-07  3:35                           ` erik quanstrom
  2015-05-06 22:35                         ` Steven Stallion
  1 sibling, 1 reply; 52+ messages in thread
From: Charles Forsyth @ 2015-05-06 22:28 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 616 bytes --]

On 6 May 2015 at 22:28, David du Colombier <0intro@gmail.com> wrote:

> Since the problem only happen when Fossil or vacfs are running
> on the same machine as Venti, I suppose this is somewhat related
> to how TCP behaves with the loopback.
>

Interesting. That would explain the clock-like delays.
Possibly it's nearly zero RTT in initial exchanges and then when venti has
to do some work,
things time out. You'd think it would only lead to needless retransmissions
not increased latency
but perhaps some calculation doesn't work properly with tiny values,
causing one side to back off
incorrectly.

[-- Attachment #2: Type: text/html, Size: 1091 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-06 21:28                       ` David du Colombier
  2015-05-06 22:28                         ` Charles Forsyth
@ 2015-05-06 22:35                         ` Steven Stallion
  2015-05-06 23:47                           ` Charles Forsyth
  1 sibling, 1 reply; 52+ messages in thread
From: Steven Stallion @ 2015-05-06 22:35 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 534 bytes --]

Definitely interesting, and explains why I've never seen the regression (I
switched to a dedicated venti server a couple of years ago). Were these the
changes that erik submitted? ISTR him working on reno bits somewhere around
there...

On Wed, May 6, 2015 at 4:28 PM, David du Colombier <0intro@gmail.com> wrote:

> Since the problem only happen when Fossil or vacfs are running
> on the same machine as Venti, I suppose this is somewhat related
> to how TCP behaves with the loopback.
>
> --
> David du Colombier
>
>

[-- Attachment #2: Type: text/html, Size: 892 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-06 22:35                         ` Steven Stallion
@ 2015-05-06 23:47                           ` Charles Forsyth
  0 siblings, 0 replies; 52+ messages in thread
From: Charles Forsyth @ 2015-05-06 23:47 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 225 bytes --]

On 6 May 2015 at 23:35, Steven Stallion <sstallion@gmail.com> wrote:

> Were these the changes that erik submitted?


I don't think so. Someone else submitted a different set of tcp changes
independently much earlier.

[-- Attachment #2: Type: text/html, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-06 22:28                         ` Charles Forsyth
@ 2015-05-07  3:35                           ` erik quanstrom
  2015-05-07  6:15                             ` David du Colombier
  0 siblings, 1 reply; 52+ messages in thread
From: erik quanstrom @ 2015-05-07  3:35 UTC (permalink / raw)
  To: 9fans

On Wed May  6 15:30:24 PDT 2015, charles.forsyth@gmail.com wrote:

> On 6 May 2015 at 22:28, David du Colombier <0intro@gmail.com> wrote:
> 
> > Since the problem only happen when Fossil or vacfs are running
> > on the same machine as Venti, I suppose this is somewhat related
> > to how TCP behaves with the loopback.
> >
> 
> Interesting. That would explain the clock-like delays.
> Possibly it's nearly zero RTT in initial exchanges and then when venti has
> to do some work,
> things time out. You'd think it would only lead to needless retransmissions
> not increased latency
> but perhaps some calculation doesn't work properly with tiny values,
> causing one side to back off
> incorrectly.

i don't think that's possible.

NOW is defined as MACHP(0)->ticks, so this is a pretty course timer
that can't go backwards on intel processors.  this limits the timer's resolution to HZ,
which on 9atom is 1000, and 100 on pretty much anything else.  further limiting the
resolution is the tcp retransmit timers which according to presotto are
	/* bounded twixt 0.3 and 64 seconds */
so i really doubt the retransmit timers are resending anything.  if someone
has a system that isn't working right, please post /net/tcp/<connectionno>/^(local remote status)
i'd like to have a look.

quoting steve stallion ,,,

> > Definitely interesting, and explains why I've never seen the regression (I
> > switched to a dedicated venti server a couple of years ago). Were these the
> > changes that erik submitted? ISTR him working on reno bits somewhere around
> > there...
>
> I don't think so. Someone else submitted a different set of tcp changes
> independently much earlier.

just for the record, the earlier changes were an incorrect partial implementation of
reno.  i implemented newreno from the specs and added corrected window scaling
and removed the problem of window slamming.  we spent a month going over cases
from 50µs to 100ms rtt latency and showed that we got near the theoretical max for
all those cases.  (big thanks to bruce wong for putting up with early, buggy versions.)

during the investigation of this i found that loopback *is* slow for reasons i don't
completely understand.  part of this was the terrible scheduler.  as part of the gsoc
work, we were able to make the nix scheduler not howlingly terrible for 1-8 cpus.  this
improvement depends on the goodness of mcs locks.  i developed a version of this,
but ended up using charles' much cleaner version.  there remain big problems with
the tcp and ip stack.  it's really slow.  i can't get >400MB/s on ethernet.  it seems
that the 3-way interaction between tcp:tx, tcp:rx and the user-space queues is the issue.
queue locking is very wasteful as well.  i have some student code that addresses part
of the latter problem, but it smells to me like ip/tcp.c's direct calls between tx and rx
are the real issue.

- erik

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-06 21:26                     ` David du Colombier
  2015-05-06 21:28                       ` David du Colombier
@ 2015-05-07  3:38                       ` erik quanstrom
  1 sibling, 0 replies; 52+ messages in thread
From: erik quanstrom @ 2015-05-07  3:38 UTC (permalink / raw)
  To: 9fans

On Wed May  6 14:28:03 PDT 2015, 0intro@gmail.com wrote:
> I got it!
>
> The regression was caused by the NewReno TCP
> change on 2013-01-24.
>
> https://github.com/0intro/plan9/commit/e8406a2f44

if you have proof, i'd be interested in reproduction of the issue from the original source, or
perhaps just nix.  let me know if i can help.

- erik



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-05 22:53               ` Aram Hăvărneanu
  2015-05-06 20:55                 ` David du Colombier
@ 2015-05-07  3:43                 ` erik quanstrom
  1 sibling, 0 replies; 52+ messages in thread
From: erik quanstrom @ 2015-05-07  3:43 UTC (permalink / raw)
  To: 9fans

On Tue May  5 15:54:45 PDT 2015, aram.h@mgk.ro wrote:
> It's pretty interesting that at least three people all got exactly
> 150kB/s on vastly different machines, both real and virtual. Maybe the
> number comes from some tick frequency?

i might suggest altering HZ and seeing if there is a throughput change in the same ratio.

- erik



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-07  3:35                           ` erik quanstrom
@ 2015-05-07  6:15                             ` David du Colombier
  2015-05-07 13:17                               ` erik quanstrom
  0 siblings, 1 reply; 52+ messages in thread
From: David du Colombier @ 2015-05-07  6:15 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> NOW is defined as MACHP(0)->ticks, so this is a pretty course timer
> that can't go backwards on intel processors.  this limits the timer's resolution to HZ,
> which on 9atom is 1000, and 100 on pretty much anything else.  further limiting the
> resolution is the tcp retransmit timers which according to presotto are
>         /* bounded twixt 0.3 and 64 seconds */
> so i really doubt the retransmit timers are resending anything.  if someone
> has a system that isn't working right, please post /net/tcp/<connectionno>/^(local remote status)
> i'd like to have a look.

The Venti listenner:

cpu% cat /net/tcp/2/local
::!17034
cpu% cat /net/tcp/2/remote
::!0
cpu% cat /net/tcp/2/status
Listen qin 0 qout 0 rq 0.0 srtt 4000 mdev 0 sst 65535 cwin 1460 swin
0>>0 rwin 65535>>0 qscale 0 timer.start 10 timer.count 0 rerecv 0
katimer.start 2400 katimer.count 0

The TCP connection from Fossil to Venti on the loopback:

cpu% cat /net/tcp/3/local
127.0.0.1!57796
cpu% cat /net/tcp/3/remote
127.0.0.1!17034
cpu% cat /net/tcp/3/status
Established qin 0 qout 0 rq 0.0 srtt 80 mdev 40 sst 1048560 cwin
258192 swin 1048560>>4 rwin 1048560>>4 qscale 4 timer.start 10
timer.count 10 rerecv 0 katimer.start 2400 katimer.count 427

--
David du Colombier



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-07  6:15                             ` David du Colombier
@ 2015-05-07 13:17                               ` erik quanstrom
  2015-05-08 16:13                                 ` David du Colombier
  0 siblings, 1 reply; 52+ messages in thread
From: erik quanstrom @ 2015-05-07 13:17 UTC (permalink / raw)
  To: 9fans

> cpu% cat /net/tcp/3/local
> 127.0.0.1!57796
> cpu% cat /net/tcp/3/remote
> 127.0.0.1!17034
> cpu% cat /net/tcp/3/status
> Established qin 0 qout 0 rq 0.0 srtt 80 mdev 40 sst 1048560 cwin
> 258192 swin 1048560>>4 rwin 1048560>>4 qscale 4 timer.start 10
> timer.count 10 rerecv 0 katimer.start 2400 katimer.count 427

hmm... large rtt, which suggests that someone is not servicing the queues
fast enough.

this is for the 1gbe machine in the room with me

11/status:Established qin 0 qout 0 rq 0.0 srtt 0 mdev 0 sst 2920 cwin 61390 swin 1048560>>4 rwin 1048560>>4 qscale 4 timer.start 10 timer.count 10 rerecv 0 katimer.start 2400 katimer.count 2101

i would suggest turning on netlog for tcp while booting and caputing the output.

sorry for the short investigation.  gotta run.

- erik



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-07 13:17                               ` erik quanstrom
@ 2015-05-08 16:13                                 ` David du Colombier
  2015-05-08 16:39                                   ` Charles Forsyth
  0 siblings, 1 reply; 52+ messages in thread
From: David du Colombier @ 2015-05-08 16:13 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I've enabled tcp, tcpwin and tcprxmt logs, but there isn't
anything very interesting.

tcpincoming s 127.0.0.1!53150/127.0.0.1!53150 d
127.0.0.1!17034/127.0.0.1!17034 v 4/4

Also, the issue is definitely related to the loopback.
There is no problem when using an address on /dev/ether0.

cpu% cat /net/tcp/3/local
192.168.0.100!43125
cpu% cat /net/tcp/3/remote
192.168.0.100!17034
cpu% cat /net/tcp/3/status
Established qin 0 qout 0 rq 0.0 srtt 0 mdev 0 sst 1048560 cwin 396560
swin 1048560>>4 rwin 1048560>>4 qscale 4 timer.start 10 timer.count 10
rerecv 0 katimer.start 2400 katimer.count 2106

--
David du Colombier



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-08 16:13                                 ` David du Colombier
@ 2015-05-08 16:39                                   ` Charles Forsyth
  2015-05-08 17:16                                     ` David du Colombier
  0 siblings, 1 reply; 52+ messages in thread
From: Charles Forsyth @ 2015-05-08 16:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 353 bytes --]

On 8 May 2015 at 17:13, David du Colombier <0intro@gmail.com> wrote:

> Also, the issue is definitely related to the loopback.
> There is no problem when using an address on /dev/ether0.
>

oh. possibly the queue isn't big enough, given the window size. it's using
qpass on a Queue with Qmsg
and if the queue is full, Blocks will be discarded.

[-- Attachment #2: Type: text/html, Size: 727 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-08 16:39                                   ` Charles Forsyth
@ 2015-05-08 17:16                                     ` David du Colombier
  2015-05-08 19:24                                       ` David du Colombier
  0 siblings, 1 reply; 52+ messages in thread
From: David du Colombier @ 2015-05-08 17:16 UTC (permalink / raw)
  To: 9fans

> oh. possibly the queue isn't big enough, given the window size.
> it's using qpass on a Queue with Qmsg and if the queue is full,
> Blocks will be discarded.

I tried to increase the size of the queue, but no luck.

--
David du Colombier



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-08 17:16                                     ` David du Colombier
@ 2015-05-08 19:24                                       ` David du Colombier
  2015-05-08 20:03                                         ` Steve Simon
                                                           ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: David du Colombier @ 2015-05-08 19:24 UTC (permalink / raw)
  To: 9fans

I've finally figured out the issue.

The slowness issue only appears on the loopback, because
it provides a 16384 MTU.

There is an old bug in the Plan 9 TCP stack, were the TCP
MSS doesn't take account the MTU for incoming connections.

I originally fixed this issue in January 2015 for the Plan 9
port on Google Compute Engine. On GCE, there is an unusual
1460 MTU.

The Plan 9 TCP stack defines a default 1460 MSS corresponding
to a 1500 MTU. Then, the MSS is fixed according to the MTU
for outgoing connections, but not incoming connections.

On GCE, this issue leads to IP fragmentation, but GCE didn't
handle IP fragmentation properly, so the connections
were dropped.

On the loopback medium, I suppose this is the opposite issue.
Since the TCP stack didn't fix the MSS in the incoming
connection, the programs sent multiple small 1500 bytes
IP packets instead of large 16384 IP packets, but I don't
know why it leads to such a slowdown.

Here is the patch for the Plan 9 kernel:

http://9legacy.org/9legacy/patch/9-tcp-mss.diff

And Charles' 9k kernel:

http://9legacy.org/9legacy/patch/9k-tcp-mss.diff

--
David du Colombier

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-08 19:24                                       ` David du Colombier
@ 2015-05-08 20:03                                         ` Steve Simon
  2015-05-08 21:19                                         ` Bakul Shah
  2015-05-09  3:11                                         ` cinap_lenrek
  2 siblings, 0 replies; 52+ messages in thread
From: Steve Simon @ 2015-05-08 20:03 UTC (permalink / raw)
  To: 9fans

I confirm - my old performance is back.

Thanks very much David.

-Steve



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-08 19:24                                       ` David du Colombier
  2015-05-08 20:03                                         ` Steve Simon
@ 2015-05-08 21:19                                         ` Bakul Shah
  2015-05-09 14:43                                           ` erik quanstrom
  2015-05-09  3:11                                         ` cinap_lenrek
  2 siblings, 1 reply; 52+ messages in thread
From: Bakul Shah @ 2015-05-08 21:19 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, 08 May 2015 21:24:13 +0200 David du Colombier <0intro@gmail.com> wrote:
> On the loopback medium, I suppose this is the opposite issue.
> Since the TCP stack didn't fix the MSS in the incoming
> connection, the programs sent multiple small 1500 bytes
> IP packets instead of large 16384 IP packets, but I don't
> know why it leads to such a slowdown.

Looking at the first few bytes in each dir of the initial TCP
handshake (with tcpdump) I see:

        0x0000:  4500 0030 24da 0000  <= from plan9 to freebsd

        0x0000:  4500 0030 d249 4000  <= from freebsd to plan9

Looks like FreeBSD always sets the DF (don't fragment) bit
(0x40 in byte 6), while plan9 doesn't (byte 6 is 0x00).

May be plan9 should set the DF (don't fragment) bit in the IP
header and try to do path MTU discovery? Either by default or
under some ctl option.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-08 19:24                                       ` David du Colombier
  2015-05-08 20:03                                         ` Steve Simon
  2015-05-08 21:19                                         ` Bakul Shah
@ 2015-05-09  3:11                                         ` cinap_lenrek
  2015-05-09  5:59                                           ` lucio
                                                             ` (2 more replies)
  2 siblings, 3 replies; 52+ messages in thread
From: cinap_lenrek @ 2015-05-09  3:11 UTC (permalink / raw)
  To: 9fans

do we really need to initialize tcb->mss to tcpmtu() in procsyn()?
as i see it, procsyn() is called only when tcb->state is Syn_sent,
which only should happen for client connections doing a connect, in
which case tcpsndsyn() would have initialized tcb->mss already no?

--
cinap



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-09  3:11                                         ` cinap_lenrek
@ 2015-05-09  5:59                                           ` lucio
  2015-05-09 16:26                                             ` cinap_lenrek
  2015-05-09 16:23                                           ` erik quanstrom
  2015-05-09 16:59                                           ` erik quanstrom
  2 siblings, 1 reply; 52+ messages in thread
From: lucio @ 2015-05-09  5:59 UTC (permalink / raw)
  To: 9fans

> do we really need to initialize tcb->mss to tcpmtu() in procsyn()?
> as i see it, procsyn() is called only when tcb->state is Syn_sent,
> which only should happen for client connections doing a connect, in
> which case tcpsndsyn() would have initialized tcb->mss already no?

tcb->mss may still need to be adjusted at this point, as it is when

	/* our sending max segment size cannot be bigger than what he asked for */

so at worst this does no harm that I can see.

Of course, I'm probably least qualified to pick these nits.

Lucio.




^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-08 21:19                                         ` Bakul Shah
@ 2015-05-09 14:43                                           ` erik quanstrom
  2015-05-09 17:25                                             ` Lyndon Nerenberg
  0 siblings, 1 reply; 52+ messages in thread
From: erik quanstrom @ 2015-05-09 14:43 UTC (permalink / raw)
  To: 9fans

> Looking at the first few bytes in each dir of the initial TCP
> handshake (with tcpdump) I see:
>
>         0x0000:  4500 0030 24da 0000  <= from plan9 to freebsd
>
>         0x0000:  4500 0030 d249 4000  <= from freebsd to plan9
>
> Looks like FreeBSD always sets the DF (don't fragment) bit
> (0x40 in byte 6), while plan9 doesn't (byte 6 is 0x00).
>
> May be plan9 should set the DF (don't fragment) bit in the IP
> header and try to do path MTU discovery? Either by default or
> under some ctl option.

easy enough until one encounters devices that don't send icmp
responses because it's not implemented, or somehow considered
"secure" that way.

- erik



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-09  3:11                                         ` cinap_lenrek
  2015-05-09  5:59                                           ` lucio
@ 2015-05-09 16:23                                           ` erik quanstrom
  2015-05-10  4:55                                             ` erik quanstrom
  2015-05-10 20:19                                             ` cinap_lenrek
  2015-05-09 16:59                                           ` erik quanstrom
  2 siblings, 2 replies; 52+ messages in thread
From: erik quanstrom @ 2015-05-09 16:23 UTC (permalink / raw)
  To: 9fans

On Fri May  8 20:12:57 PDT 2015, cinap_lenrek@felloff.net wrote:
> do we really need to initialize tcb->mss to tcpmtu() in procsyn()?
> as i see it, procsyn() is called only when tcb->state is Syn_sent,
> which only should happen for client connections doing a connect, in
> which case tcpsndsyn() would have initialized tcb->mss already no?

i think there was a subtile reason for this, but i don't recall.  a real
reason for setting it here is because it makes the code easier to reason
about, imo.

there are a couple problems with the patch as it stands.  they are
inherited from previous mistakes.

* the setting of tpriv->stats[Mss] is bogus.  it's not shared between connections.
it is also v4 only.

* so, mss should be added to each tcp connection's status file.

* the setting of tcb->mss in tcpincoming is not correct, tcp->mss is
set by SYN, not by ACK, and may not be reset.  (see snoopy below.)

* the SYN-ACK needs to send the local mss, not echo the remote mss.
asymmetry is "fine" in the other side, even if ip/tcp.c isn't smart enough to
keep tx and rx mss seperate.  (scare quotes = untested, there may be
some performance niggles if the sender is sending legal packets larger than
tcb->mss.)

my patch to nix is below.  i haven't submitted it yet.

- erik

---
005319 ms
	ether(s=a0369f1c3af7 d=0cc47a328da4 pr=0800 ln=62)
	ip(s=10.1.1.8 d=10.1.1.9 id=ee54 frag=0000 ttl=255 pr=6 ln=48)
	tcp(s=38903 d=17766 seq=3552109414 ack=0 fl=S win=65535 ck=d68e ln=0 opt4=(mss 1460) opt3=(wscale 4) opt=NOOP)
005320 ms
	ether(s=0cc47a328da4 d=a0369f1c3af7 pr=0800 ln=62)
	ip(s=10.1.1.9 d=10.1.1.8 id=54d3 frag=0000 ttl=255 pr=6 ln=48)
	tcp(s=17766 d=38903 seq=441373010 ack=3552109415 fl=AS win=65535 ck=eadc ln=0 opt4=(mss 1460) opt3=(wscale 4) opt=NOOP)

---

/n/dump/2015/0509/sys/src/nix/ip/tcp.c:491,501 - /sys/src/nix/ip/tcp.c:491,502
  	s = (Tcpctl*)(c->ptcl);

  	return snprint(state, n,
- 		"%s qin %d qout %d rq %d.%d srtt %d mdev %d sst %lud cwin %lud swin %lud>>%d rwin %lud>>%d qscale %d timer.start %d timer.count %d rerecv %d katimer.start %d katimer.count %d\n",
+ 		"%s qin %d qout %d rq %d.%d mss %d srtt %d mdev %d sst %lud cwin %lud swin %lud>>%d rwin %lud>>%d qscale %d timer.start %d timer.count %d rerecv %d katimer.start %d katimer.count %d\n",
  		tcpstates[s->state],
  		c->rq ? qlen(c->rq) : 0,
  		c->wq ? qlen(c->wq) : 0,
  		s->nreseq, s->reseqlen,
+ 		s->mss,
  		s->srtt, s->mdev, s->ssthresh,
  		s->cwind, s->snd.wnd, s->rcv.scale, s->rcv.wnd, s->snd.scale,
  		s->qscale,
/n/dump/2015/0509/sys/src/nix/ip/tcp.c:843,854 - /sys/src/nix/ip/tcp.c:844,857

  /* mtu (- TCP + IP hdr len) of 1st hop */
  static int
- tcpmtu(Proto *tcp, uchar *addr, int version, uint *scale)
+ tcpmtu(Proto *tcp, uchar *addr, int version, uint reqmss, uint *scale)
  {
+ 	Tcppriv *tpriv;
  	Ipifc *ifc;
  	int mtu;

  	ifc = findipifc(tcp->f, addr, 0);
+ 	tpriv = tcp->priv;
  	switch(version){
  	default:
  	case V4:
/n/dump/2015/0509/sys/src/nix/ip/tcp.c:855,865 - /sys/src/nix/ip/tcp.c:858,870
  		mtu = DEF_MSS;
  		if(ifc != nil)
  			mtu = ifc->maxtu - ifc->m->hsize - (TCP4_PKT + TCP4_HDRSIZE);
+ 		tpriv->stats[Mss] = mtu;
  		break;
  	case V6:
  		mtu = DEF_MSS6;
  		if(ifc != nil)
  			mtu = ifc->maxtu - ifc->m->hsize - (TCP6_PKT + TCP6_HDRSIZE);
+ 		tpriv->stats[Mss] = mtu + (TCP6_PKT + TCP6_HDRSIZE) - (TCP4_PKT + TCP4_HDRSIZE);
  		break;
  	}
  	/*
/n/dump/2015/0509/sys/src/nix/ip/tcp.c:868,873 - /sys/src/nix/ip/tcp.c:873,882
  	 */
  	*scale = Defadvscale;

+ 	/* our sending max segment size cannot be bigger than what he asked for */
+ 	if(reqmss != 0 && reqmss < mtu)
+ 		mtu = reqmss;
+
  	return mtu;
  }

/n/dump/2015/0509/sys/src/nix/ip/tcp.c:1300,1307 - /sys/src/nix/ip/tcp.c:1309,1314
  static void
  tcpsndsyn(Conv *s, Tcpctl *tcb)
  {
- 	Tcppriv *tpriv;
-
  	tcb->iss = (nrand(1<<16)<<16)|nrand(1<<16);
  	tcb->rttseq = tcb->iss;
  	tcb->snd.wl2 = tcb->iss;
/n/dump/2015/0509/sys/src/nix/ip/tcp.c:1314,1322 - /sys/src/nix/ip/tcp.c:1321,1327
  	tcb->sndsyntime = NOW;

  	/* set desired mss and scale */
- 	tcb->mss = tcpmtu(s->p, s->laddr, s->ipversion, &tcb->scale);
- 	tpriv = s->p->priv;
- 	tpriv->stats[Mss] = tcb->mss;
+ 	tcb->mss = tcpmtu(s->p, s->laddr, s->ipversion, 0, &tcb->scale);
  }

  void
/n/dump/2015/0509/sys/src/nix/ip/tcp.c:1492,1498 - /sys/src/nix/ip/tcp.c:1497,1503
  	seg.ack = lp->irs+1;
  	seg.flags = SYN|ACK;
  	seg.urg = 0;
- 	seg.mss = tcpmtu(tcp, lp->laddr, lp->version, &scale);
+ 	seg.mss = tcpmtu(tcp, lp->laddr, lp->version, 0, &scale);	/* send our mss, not lp->mss */
  	seg.wnd = QMAX;

  	/* if the other side set scale, we should too */
/n/dump/2015/0509/sys/src/nix/ip/tcp.c:1767,1777 - /sys/src/nix/ip/tcp.c:1772,1779
  	tcb->flgcnt = 0;
  	tcb->flags |= SYNACK;

- 	/* our sending max segment size cannot be bigger than what he asked for */
- 	if(lp->mss != 0 && lp->mss < tcb->mss) {
- 		tcb->mss = lp->mss;
- 		tpriv->stats[Mss] = tcb->mss;
- 	}
+ 	/* per rfc, we can't set the mss any more */
+ //	tcb->mss = tcpmtu(s->p, lp->laddr, lp->version, lp->mss, &tcb->scale);

  	/* window scaling */
  	tcpsetscale(new, tcb, lp->rcvscale, lp->sndscale);
/n/dump/2015/0509/sys/src/nix/ip/tcp.c:3014,3020 - /sys/src/nix/ip/tcp.c:3016,3021
  procsyn(Conv *s, Tcp *seg)
  {
  	Tcpctl *tcb;
- 	Tcppriv *tpriv;

  	tcb = (Tcpctl*)s->ptcl;
  	tcb->flags |= FORCE;
/n/dump/2015/0509/sys/src/nix/ip/tcp.c:3026,3036 - /sys/src/nix/ip/tcp.c:3027,3033
  	tcb->irs = seg->seq;

  	/* our sending max segment size cannot be bigger than what he asked for */
- 	if(seg->mss != 0 && seg->mss < tcb->mss) {
- 		tcb->mss = seg->mss;
- 		tpriv = s->p->priv;
- 		tpriv->stats[Mss] = tcb->mss;
- 	}
+ 	tcb->mss = tcpmtu(s->p, s->laddr, s->ipversion, seg->mss, &tcb->scale);

  	tcb->snd.wnd = seg->wnd;
  	initialwindow(tcb);



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-09  5:59                                           ` lucio
@ 2015-05-09 16:26                                             ` cinap_lenrek
  0 siblings, 0 replies; 52+ messages in thread
From: cinap_lenrek @ 2015-05-09 16:26 UTC (permalink / raw)
  To: 9fans

yes, but i was not refering to the adjusting which isnt changed here. only
the tcpmtu() call that got added.

yes, it *should* not make any difference but maybe we'r missing
something. at worst it makes the code more confusing and cause bugs in
the future because one of the initializations of mss is a lie without
any effect.

--
cinap

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-09  3:11                                         ` cinap_lenrek
  2015-05-09  5:59                                           ` lucio
  2015-05-09 16:23                                           ` erik quanstrom
@ 2015-05-09 16:59                                           ` erik quanstrom
  2 siblings, 0 replies; 52+ messages in thread
From: erik quanstrom @ 2015-05-09 16:59 UTC (permalink / raw)
  To: 9fans

On Fri May  8 20:12:57 PDT 2015, cinap_lenrek@felloff.net wrote:
> do we really need to initialize tcb->mss to tcpmtu() in procsyn()?
> as i see it, procsyn() is called only when tcb->state is Syn_sent,
> which only should happen for client connections doing a connect, in
> which case tcpsndsyn() would have initialized tcb->mss already no?

yes, we should.  the bug is that we confuse send mss and receive mss.
the sender's mss is the one we need to repsect here.
tcpsendsyn() should not set the mss, the mss it calculates is for rx.

- erik



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-09 14:43                                           ` erik quanstrom
@ 2015-05-09 17:25                                             ` Lyndon Nerenberg
  2015-05-09 17:30                                               ` Devon H. O'Dell
  2015-05-09 18:20                                               ` Bakul Shah
  0 siblings, 2 replies; 52+ messages in thread
From: Lyndon Nerenberg @ 2015-05-09 17:25 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 659 bytes --]

On May 9, 2015, at 7:43 AM, erik quanstrom <quanstro@quanstro.net> wrote:

> easy enough until one encounters devices that don't send icmp
> responses because it's not implemented, or somehow considered
> "secure" that way.

Oddly enough, I don't see this 'problem' in the real world.  And FreeBSD is far from being alone in the always-set-DF bit.

The only place this bites is when you run into tiny shops with homegrown firewalls configured by people who don't understand networking or security.  Me, I consider it a feature that these sites self-select themselves off the network.  I'm certainly no worse off for not being able to talk to them.

[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 817 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-09 17:25                                             ` Lyndon Nerenberg
@ 2015-05-09 17:30                                               ` Devon H. O'Dell
  2015-05-09 17:35                                                 ` Lyndon Nerenberg
  2015-05-09 18:20                                               ` Bakul Shah
  1 sibling, 1 reply; 52+ messages in thread
From: Devon H. O'Dell @ 2015-05-09 17:30 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2015-05-09 10:25 GMT-07:00 Lyndon Nerenberg <lyndon@orthanc.ca>:
>
>
> On May 9, 2015, at 7:43 AM, erik quanstrom <quanstro@quanstro.net> wrote:
>
> > easy enough until one encounters devices that don't send icmp
> > responses because it's not implemented, or somehow considered
> > "secure" that way.
>
> Oddly enough, I don't see this 'problem' in the real world.  And FreeBSD is far from being alone in the always-set-DF bit.
>
> The only place this bites is when you run into tiny shops with homegrown firewalls configured by people who don't understand networking or security.  Me, I consider it a feature that these sites self-select themselves off the network.  I'm certainly no worse off for not being able to talk to them.

Or when your client is on a cell phone. Cell networks are the worst.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-09 17:30                                               ` Devon H. O'Dell
@ 2015-05-09 17:35                                                 ` Lyndon Nerenberg
  2015-05-09 21:54                                                   ` Devon H. O'Dell
  0 siblings, 1 reply; 52+ messages in thread
From: Lyndon Nerenberg @ 2015-05-09 17:35 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 314 bytes --]


On May 9, 2015, at 10:30 AM, Devon H. O'Dell <devon.odell@gmail.com> wrote:

> Or when your client is on a cell phone. Cell networks are the worst.

Really?  Quite often I slave my laptop to my phone's LTE connection, and I never have problems with PMTU.  Both here (across western Canada) and in the UK.


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 817 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-09 17:25                                             ` Lyndon Nerenberg
  2015-05-09 17:30                                               ` Devon H. O'Dell
@ 2015-05-09 18:20                                               ` Bakul Shah
  1 sibling, 0 replies; 52+ messages in thread
From: Bakul Shah @ 2015-05-09 18:20 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs



> On May 9, 2015, at 10:25 AM, Lyndon Nerenberg <lyndon@orthanc.ca> wrote:
> 
> 
>> On May 9, 2015, at 7:43 AM, erik quanstrom <quanstro@quanstro.net> wrote:
>> 
>> easy enough until one encounters devices that don't send icmp
>> responses because it's not implemented, or somehow considered
>> "secure" that way.
> 
> Oddly enough, I don't see this 'problem' in the real world.  And FreeBSD is far from being alone in the always-set-DF bit.
> 
> The only place this bites is when you run into tiny shops with homegrown firewalls configured by people who don't understand networking or security.  Me, I consider it a feature that these sites self-select themselves off the network.  I'm certainly no worse off for not being able to talk to them.

Network admins not understanding ICMP was far more common 20 years ago. Now the game has changed. At any rate no harm in trying PMTU discovery as an option (other than a SMOP).


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-09 17:35                                                 ` Lyndon Nerenberg
@ 2015-05-09 21:54                                                   ` Devon H. O'Dell
  0 siblings, 0 replies; 52+ messages in thread
From: Devon H. O'Dell @ 2015-05-09 21:54 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2015-05-09 10:35 GMT-07:00 Lyndon Nerenberg <lyndon@orthanc.ca>:
>
> On May 9, 2015, at 10:30 AM, Devon H. O'Dell <devon.odell@gmail.com> wrote:
>
>> Or when your client is on a cell phone. Cell networks are the worst.
>
> Really?  Quite often I slave my laptop to my phone's LTE connection, and I never have problems with PMTU.  Both here (across western Canada) and in the UK.

There are lots of hacks all over the Internet to deal with various
brokenness on the carrier<->carrier side of things where one end is a
cell network. Haven't seen anything come up super recently, but had to
help debug some brokenness as recently as a year and a half ago that
turned out to be some cell network with really old hardware that
didn't do PMTU correctly, causing TLS connections to drop or die. IIRC
this particular case was in France, but I also seem to recall the same
issue in northern England and perhaps Ireland.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-09 16:23                                           ` erik quanstrom
@ 2015-05-10  4:55                                             ` erik quanstrom
  2015-05-10  5:07                                               ` erik quanstrom
  2015-05-10 20:19                                             ` cinap_lenrek
  1 sibling, 1 reply; 52+ messages in thread
From: erik quanstrom @ 2015-05-10  4:55 UTC (permalink / raw)
  To: 9fans

for what it's worth, the original newreno work tcp does not have the mtu
bug.  on a 8 processor system i have around here i get

bwc; while() nettest -a 127.1
tcp!127.0.0.1!40357 count 100000; 819200000 bytes in 1.505948 s @ 519 MB/s (0ms)
tcp!127.0.0.1!47983 count 100000; 819200000 bytes in 1.377984 s @ 567 MB/s (0ms)
tcp!127.0.0.1!53197 count 100000; 819200000 bytes in 1.299967 s @ 601 MB/s (0ms)
tcp!127.0.0.1!61569 count 100000; 819200000 bytes in 1.418073 s @ 551 MB/s (0ms)

however, after fixing things so the initial cwind isn't hosed, i get a little better story:

bwc; while() nettest -a 127.1
tcp!127.0.0.1!54261 count 100000; 819200000 bytes in .5947659 s @ 1.31e+03 MB/s (0ms)

boo yah!  not bad for trying to clean up some constants.

- erik



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-10  4:55                                             ` erik quanstrom
@ 2015-05-10  5:07                                               ` erik quanstrom
  2015-05-10 17:57                                                 ` David du Colombier
  0 siblings, 1 reply; 52+ messages in thread
From: erik quanstrom @ 2015-05-10  5:07 UTC (permalink / raw)
  To: 9fans

> however, after fixing things so the initial cwind isn't hosed, i get a little better story:

so, actually, i think this is the root cause.  the intial cwind is misset for loopback.
i but that the symptom folks will see is that /net/tcp/stats shows fragmentation when
performance sucks.  evidently there is a backoff bug in sources' tcp, too.

i'd love confirmation of this.

- erik

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-10  5:07                                               ` erik quanstrom
@ 2015-05-10 17:57                                                 ` David du Colombier
  2015-05-10 20:18                                                   ` erik quanstrom
  0 siblings, 1 reply; 52+ messages in thread
From: David du Colombier @ 2015-05-10 17:57 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

>> however, after fixing things so the initial cwind isn't hosed, i get a little better story:
>
> so, actually, i think this is the root cause.  the intial cwind is misset for loopback.
> i but that the symptom folks will see is that /net/tcp/stats shows fragmentation when
> performance sucks.  evidently there is a backoff bug in sources' tcp, too.

What is your cwind change?

--
David du Colombier



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-10 17:57                                                 ` David du Colombier
@ 2015-05-10 20:18                                                   ` erik quanstrom
  0 siblings, 0 replies; 52+ messages in thread
From: erik quanstrom @ 2015-05-10 20:18 UTC (permalink / raw)
  To: 9fans

On Sun May 10 10:58:55 PDT 2015, 0intro@gmail.com wrote:
> >> however, after fixing things so the initial cwind isn't hosed, i get a little better story:
> >
> > so, actually, i think this is the root cause.  the intial cwind is misset for loopback.
> > i but that the symptom folks will see is that /net/tcp/stats shows fragmentation when
> > performance sucks.  evidently there is a backoff bug in sources' tcp, too.
> 
> What is your cwind change?
> 

the patch is here: /n/atom/patch/tcpmss

note i applied a patch to nettest(8) to simulate a rpc-style protocol.  i still ~500MB/s
with my test machine simulating rpc-style transactions, or 15µs per 8k transaction.

we're at least an order of magnitude off the performance mark for this.

a similar test using pipe(2) shows a latency of 5.7µs (!) for a pipe-based rpc, which limits
us to about 1.4 GB/s for 8k pipe-based ping-poing rpc.

- erik



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-09 16:23                                           ` erik quanstrom
  2015-05-10  4:55                                             ` erik quanstrom
@ 2015-05-10 20:19                                             ` cinap_lenrek
  2015-05-10 20:51                                               ` erik quanstrom
  1 sibling, 1 reply; 52+ messages in thread
From: cinap_lenrek @ 2015-05-10 20:19 UTC (permalink / raw)
  To: 9fans

> * the SYN-ACK needs to send the local mss, not echo the remote mss.
> asymmetry is "fine" in the other side, even if ip/tcp.c isn't smart enough to
> keep tx and rx mss seperate.  (scare quotes = untested, there may be
> some performance niggles if the sender is sending legal packets larger than
> tcb->mss.)

that is what it already does as far as i can see. on the server side, we receive a
SYN, put it in limbo and reply with SYN|ACK (sndsynack()) sending our local
mss straight from tcpmtu(), no adjust. at this point heres no connection or tcb as
everything is still in limbo. only once we receive the ACK, tcpincoming() gets called which
pulls the info we got so far (including the mss sent by the client in the SYN pakcet) out of
limbo and sets up a connection with its tcb.

to summarize what happens on the server for incoming connection:

1.a) tcpiput() gets a SYN packet for Listening connection, calls limbo().
1.b) limbo() saves the info (including mss) from SYN in limbo database and calls sndsynack().
1.c) sndsynack() sends SYN|ACK packet with mss option set from tcpmtu() without any adjust.

2.a) tcpiput() gets a ACK packet for Listening connection, calls tcpincoming().
2.b) tcpincoming() looks in limbo, finds lp. and makes new connection.
3.c) initialize our connections tcb->mss.

> * the setting of tcb->mss in tcpincoming is not correct, tcp->mss is
> set by SYN, not by ACK, and may not be reset.  (see snoopy below.)

you say we shouldnt initialize tcb->mss in 3.c and not use the mss from the
initial SYN to adjust it. i dont understand why not as i dont see where it
would be initialized otherwise. it appears that was what the initial patch
from david was about to fix which made sense to me.

as far as i can see, the procsyn() is unrelated to server side incoming
connections. it only gets called on behalf of a client outgoing connect
when the connection is in Syn_sent state and processes the SYN|ACK that
was generated by the process descibed in 1.c above.

--
cinap

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-10 20:19                                             ` cinap_lenrek
@ 2015-05-10 20:51                                               ` erik quanstrom
  2015-05-10 21:34                                                 ` cinap_lenrek
  0 siblings, 1 reply; 52+ messages in thread
From: erik quanstrom @ 2015-05-10 20:51 UTC (permalink / raw)
  To: 9fans

> 2.a) tcpiput() gets a ACK packet for Listening connection, calls tcpincoming().
> 2.b) tcpincoming() looks in limbo, finds lp. and makes new connection.
> 3.c) initialize our connections tcb->mss.
>
> > * the setting of tcb->mss in tcpincoming is not correct, tcp->mss is
> > set by SYN, not by ACK, and may not be reset.  (see snoopy below.)
>
> you say we shouldnt initialize tcb->mss in 3.c and not use the mss from the
> initial SYN to adjust it. i dont understand why not as i dont see where it
> would be initialized otherwise. it appears that was what the initial patch
> from david was about to fix which made sense to me.

that was the opposite of what i was saying.  the issue was i misread tcpincoming().

- erik



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-10 20:51                                               ` erik quanstrom
@ 2015-05-10 21:34                                                 ` cinap_lenrek
  2015-05-11  1:23                                                   ` erik quanstrom
  0 siblings, 1 reply; 52+ messages in thread
From: cinap_lenrek @ 2015-05-10 21:34 UTC (permalink / raw)
  To: 9fans

how is this the opposite? your patch shows the tcb->mss init being removed
completely from tcpincoming().

- 	/* our sending max segment size cannot be bigger than what he asked for */
- 	if(lp->mss != 0 && lp->mss < tcb->mss) {
- 		tcb->mss = lp->mss;
- 		tpriv->stats[Mss] = tcb->mss;
- 	}
+ 	/* per rfc, we can't set the mss any more */
+ //	tcb->mss = tcpmtu(s->p, lp->laddr, lp->version, lp->mss, &tcb->scale);

--
cinap



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [9fans] fossil+venti performance question
  2015-05-10 21:34                                                 ` cinap_lenrek
@ 2015-05-11  1:23                                                   ` erik quanstrom
  0 siblings, 0 replies; 52+ messages in thread
From: erik quanstrom @ 2015-05-11  1:23 UTC (permalink / raw)
  To: 9fans

On Sun May 10 14:36:15 PDT 2015, cinap_lenrek@felloff.net wrote:
> how is this the opposite? your patch shows the tcb->mss init being removed
> completely from tcpincoming().
>
> - 	/* our sending max segment size cannot be bigger than what he asked for */
> - 	if(lp->mss != 0 && lp->mss < tcb->mss) {
> - 		tcb->mss = lp->mss;
> - 		tpriv->stats[Mss] = tcb->mss;
> - 	}
> + 	/* per rfc, we can't set the mss any more */
> + //	tcb->mss = tcpmtu(s->p, lp->laddr, lp->version, lp->mss, &tcb->scale);

i haven't updated the patch.

- erik



^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2015-05-11  1:23 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-04  9:32 [9fans] fossil+venti performance question KADOTA Kyohei
2015-05-04 16:10 ` Anthony Sorace
2015-05-04 18:11   ` Aram Hăvărneanu
2015-05-04 18:51     ` David du Colombier
2015-05-05 14:29       ` Sergey Zhilkin
2015-05-05 15:05       ` Charles Forsyth
2015-05-05 15:38         ` David du Colombier
2015-05-05 22:23           ` Charles Forsyth
2015-05-05 22:29             ` cinap_lenrek
2015-05-05 22:33             ` David du Colombier
2015-05-05 22:53               ` Aram Hăvărneanu
2015-05-06 20:55                 ` David du Colombier
2015-05-06 21:17                   ` Charles Forsyth
2015-05-06 21:26                     ` David du Colombier
2015-05-06 21:28                       ` David du Colombier
2015-05-06 22:28                         ` Charles Forsyth
2015-05-07  3:35                           ` erik quanstrom
2015-05-07  6:15                             ` David du Colombier
2015-05-07 13:17                               ` erik quanstrom
2015-05-08 16:13                                 ` David du Colombier
2015-05-08 16:39                                   ` Charles Forsyth
2015-05-08 17:16                                     ` David du Colombier
2015-05-08 19:24                                       ` David du Colombier
2015-05-08 20:03                                         ` Steve Simon
2015-05-08 21:19                                         ` Bakul Shah
2015-05-09 14:43                                           ` erik quanstrom
2015-05-09 17:25                                             ` Lyndon Nerenberg
2015-05-09 17:30                                               ` Devon H. O'Dell
2015-05-09 17:35                                                 ` Lyndon Nerenberg
2015-05-09 21:54                                                   ` Devon H. O'Dell
2015-05-09 18:20                                               ` Bakul Shah
2015-05-09  3:11                                         ` cinap_lenrek
2015-05-09  5:59                                           ` lucio
2015-05-09 16:26                                             ` cinap_lenrek
2015-05-09 16:23                                           ` erik quanstrom
2015-05-10  4:55                                             ` erik quanstrom
2015-05-10  5:07                                               ` erik quanstrom
2015-05-10 17:57                                                 ` David du Colombier
2015-05-10 20:18                                                   ` erik quanstrom
2015-05-10 20:19                                             ` cinap_lenrek
2015-05-10 20:51                                               ` erik quanstrom
2015-05-10 21:34                                                 ` cinap_lenrek
2015-05-11  1:23                                                   ` erik quanstrom
2015-05-09 16:59                                           ` erik quanstrom
2015-05-06 22:35                         ` Steven Stallion
2015-05-06 23:47                           ` Charles Forsyth
2015-05-07  3:38                       ` erik quanstrom
2015-05-07  3:43                 ` erik quanstrom
2015-05-05 15:07     ` KADOTA Kyohei
2015-05-05 14:47   ` KADOTA Kyohei
2015-05-05 15:46     ` steve
2015-05-05 15:54       ` David du Colombier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).