* [9fans] fossil+venti performance question @ 2015-05-04 9:32 KADOTA Kyohei 2015-05-04 16:10 ` Anthony Sorace 0 siblings, 1 reply; 52+ messages in thread From: KADOTA Kyohei @ 2015-05-04 9:32 UTC (permalink / raw) To: 9fans Hello, fans. I’m running Plan 9(labs) on public QEMU/KVM service. My Plan 9 system has a slow read performance problem. I ran 'iostats md5sum /386/9pcf’, DMA is on, read result is 150KB/s. but write performance is fast. My Plan 9 system has a 200GB HDD, formatted with fossil+venti. disk layout is: - 9fat 100MB - nvram 512B - fossil 31.82GB - arenas 159.11GB - isect 7.95GB - bloom 512MB - swap 512MB Also, I explained other installations. 1)200GB HDD with fossil only. 2)100GB HDD with fossil+venti. Read performance is fast (about 15MB/s) both installations. Could you tell me the reason? ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-04 9:32 [9fans] fossil+venti performance question KADOTA Kyohei @ 2015-05-04 16:10 ` Anthony Sorace 2015-05-04 18:11 ` Aram Hăvărneanu 2015-05-05 14:47 ` KADOTA Kyohei 0 siblings, 2 replies; 52+ messages in thread From: Anthony Sorace @ 2015-05-04 16:10 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs The reason, in general: In a fossil+venti setup, fossil runs (basically) as a cache for venti. If your access just hits fossil, it’ll be quick; if not, you hit the (significantly slower) venti. I bet if you re-run the same test twice in a row, you’re going to see dramatically improved performance. Try it. If that’s true, the question is really one of venti performance; if not, you may have another system config issue. There are various changes you can make to how venti uses disk/memory that can speed things up, but I don’t have a good handle on which to suggest first. Your write performance in that test isn’t really relevant: they’re not hitting the file system at all. I’m not sure why you’d see a difference in a fossil+venti setup of a different size, but the partition size relationships, and the in-memory cache size relationships, are what’s mostly important. a ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-04 16:10 ` Anthony Sorace @ 2015-05-04 18:11 ` Aram Hăvărneanu 2015-05-04 18:51 ` David du Colombier 2015-05-05 15:07 ` KADOTA Kyohei 2015-05-05 14:47 ` KADOTA Kyohei 1 sibling, 2 replies; 52+ messages in thread From: Aram Hăvărneanu @ 2015-05-04 18:11 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs I have seen the same problem a few years back on about half of my machines. The other half were fine. There was a 1000x difference in performance between the good and bad machines. I have spent some time debugging this, but unfortunately, I couldn't find the root cause, and I just stopped using fossil. It only happens when fossil is used with Plan 9 venti, it does not happen when fossil is used by itself, and it does not happen when fossil is used with plan9port venti. In all these scenarios, the data is present in fossil, it does not need to be fetched from venti, so the venti performance is not the issue. The problem is that the mere presence of Plan 9 venti induces this problem somewhere else (fossil or the kernel). -- Aram Hăvărneanu ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-04 18:11 ` Aram Hăvărneanu @ 2015-05-04 18:51 ` David du Colombier 2015-05-05 14:29 ` Sergey Zhilkin 2015-05-05 15:05 ` Charles Forsyth 2015-05-05 15:07 ` KADOTA Kyohei 1 sibling, 2 replies; 52+ messages in thread From: David du Colombier @ 2015-05-04 18:51 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs I'm experiencing the same issue as well. When I launch vacfs on the same machine as Venti, reading is very slow. When I launch vacfs on another Plan 9 or Unix machine, reading is fast. I've just made some measurements when reading a file: Vacfs running on the same machine as Venti: 151 KB/s Vacfs running on another machine: 5131 KB/s -- David du Colombier ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-04 18:51 ` David du Colombier @ 2015-05-05 14:29 ` Sergey Zhilkin 2015-05-05 15:05 ` Charles Forsyth 1 sibling, 0 replies; 52+ messages in thread From: Sergey Zhilkin @ 2015-05-05 14:29 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 1353 bytes --] Hello! imho placing fossil, venti, isect, bloom and swap on single drive is bad idea. As written in in http://plan9.bell-labs.com/sys/doc/venti/venti.html - "The prototype Venti server is implemented for the Plan 9 operating system in about 10,000 lines of C. The server runs on a dedicated dual 550Mhz Pentium III processor system with 2 Gbyte of memory and is accessed over a 100Mbs Ethernet network. The data log is stored on a 500 Gbyte MaxTronic IDE Raid 5 Array and the index resides on a string of 8 Seagate Cheetah 18XL 9 Gbyte SCSI drives." God ide is to store isect on multiple SSD drives :) to speed up search. My small installation - 80Gb (PATA) 9fat, fossil, swap + 40Gb isect, bloom drive (PATA) + 1Tb SATA as arenas. No RAID. 2015-05-04 21:51 GMT+03:00 David du Colombier <0intro@gmail.com>: > I'm experiencing the same issue as well. > > When I launch vacfs on the same machine as Venti, > reading is very slow. When I launch vacfs on another > Plan 9 or Unix machine, reading is fast. > > I've just made some measurements when reading a file: > > Vacfs running on the same machine as Venti: 151 KB/s > Vacfs running on another machine: 5131 KB/s > > -- > David du Colombier > > -- С наилучшими пожеланиями Жилкин Сергей With best regards Zhilkin Sergey [-- Attachment #2: Type: text/html, Size: 2324 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-04 18:51 ` David du Colombier 2015-05-05 14:29 ` Sergey Zhilkin @ 2015-05-05 15:05 ` Charles Forsyth 2015-05-05 15:38 ` David du Colombier 1 sibling, 1 reply; 52+ messages in thread From: Charles Forsyth @ 2015-05-05 15:05 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 290 bytes --] On 4 May 2015 at 19:51, David du Colombier <0intro@gmail.com> wrote: > > I've just made some measurements when reading a file: > > Vacfs running on the same machine as Venti: 151 KB/s > Vacfs running on another machine: 5131 KB/s How many times do you time it on each machine? [-- Attachment #2: Type: text/html, Size: 585 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-05 15:05 ` Charles Forsyth @ 2015-05-05 15:38 ` David du Colombier 2015-05-05 22:23 ` Charles Forsyth 0 siblings, 1 reply; 52+ messages in thread From: David du Colombier @ 2015-05-05 15:38 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs >> I've just made some measurements when reading a file: >> >> Vacfs running on the same machine as Venti: 151 KB/s >> Vacfs running on another machine: 5131 KB/s > > > How many times do you time it on each machine? Maybe ten times. The results are always the same ~5%. Also, I restarted vacfs between each try. It's easy to reproduce this issue with vacfs. I think anyone running Venti on Plan 9 can observe this problem. -- David du Colombier ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-05 15:38 ` David du Colombier @ 2015-05-05 22:23 ` Charles Forsyth 2015-05-05 22:29 ` cinap_lenrek 2015-05-05 22:33 ` David du Colombier 0 siblings, 2 replies; 52+ messages in thread From: Charles Forsyth @ 2015-05-05 22:23 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 1479 bytes --] On 5 May 2015 at 16:38, David du Colombier <0intro@gmail.com> wrote: > > How many times do you time it on each machine? > > Maybe ten times. The results are always the same ~5%. > Also, I restarted vacfs between each try. It was the effect of the ram caches that prompted the question. My experience is similar to Steve's: it was faster, and now it's initially very very slow. I looked at changes from that version of venti to this, and I didn't see anything that would cause that. (The problem could be outside venti, but I looked at some possibly relevant kernel changes too.) Note that the raw drive speed on my venti machine is fine (no doubt it could be better, but it's fine). I convinced myself through experiments that the problem was with venti, not fossil. I used some debugging code in venti and had the impression that it took a surprisingly long time to handle each request: that the time was in venti. The effect was similar to that of a lost interrupt for a device driver. I used ratrace on it, but didn't spot an obvious culprit. I was tempted to rip out or disable the drive scheduling code in venti to see what happened, but not for the first time I ran out of time and had to get back to some other work. One thing I didn't know was that the results were different when fossil was on a different machine. I thought I'd tried that with vacfs myself, but apparently not or mine was as slow as when on the same machine. [-- Attachment #2: Type: text/html, Size: 2284 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-05 22:23 ` Charles Forsyth @ 2015-05-05 22:29 ` cinap_lenrek 2015-05-05 22:33 ` David du Colombier 1 sibling, 0 replies; 52+ messages in thread From: cinap_lenrek @ 2015-05-05 22:29 UTC (permalink / raw) To: 9fans semlocks? anyway, should not be too hard to figure out with /n/dump -- cinap ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-05 22:23 ` Charles Forsyth 2015-05-05 22:29 ` cinap_lenrek @ 2015-05-05 22:33 ` David du Colombier 2015-05-05 22:53 ` Aram Hăvărneanu 1 sibling, 1 reply; 52+ messages in thread From: David du Colombier @ 2015-05-05 22:33 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Yes, I'm pretty sure it's not related to Fossil, since it happens with vacfs as well. Also, Venti was pretty much unchanged during the last few years. I suspected it was related to the lock change on 2013-09-19. https://github.com/0intro/plan9/commit/c4d045a91e But I remember I tried to revert this change and the problem was still present. Maybe I should try again to be sure. -- David du Colombier ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-05 22:33 ` David du Colombier @ 2015-05-05 22:53 ` Aram Hăvărneanu 2015-05-06 20:55 ` David du Colombier 2015-05-07 3:43 ` erik quanstrom 0 siblings, 2 replies; 52+ messages in thread From: Aram Hăvărneanu @ 2015-05-05 22:53 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs It's pretty interesting that at least three people all got exactly 150kB/s on vastly different machines, both real and virtual. Maybe the number comes from some tick frequency? -- Aram Hăvărneanu ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-05 22:53 ` Aram Hăvărneanu @ 2015-05-06 20:55 ` David du Colombier 2015-05-06 21:17 ` Charles Forsyth 2015-05-07 3:43 ` erik quanstrom 1 sibling, 1 reply; 52+ messages in thread From: David du Colombier @ 2015-05-06 20:55 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Just to be sure, I tried again, and the issue is not related to the lock change on 2013-09-19. However, now I'm sure the issue was caused by a kernel change in 2013. There is no problem when running a kernel from early 2013. -- David du Colombier ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-06 20:55 ` David du Colombier @ 2015-05-06 21:17 ` Charles Forsyth 2015-05-06 21:26 ` David du Colombier 0 siblings, 1 reply; 52+ messages in thread From: Charles Forsyth @ 2015-05-06 21:17 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 268 bytes --] On 6 May 2015 at 21:55, David du Colombier <0intro@gmail.com> wrote: > However, now I'm sure the issue was caused by a kernel > change in 2013. > > There is no problem when running a kernel from early 2013. > Welly, welly, welly, well. That is interesting. [-- Attachment #2: Type: text/html, Size: 855 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-06 21:17 ` Charles Forsyth @ 2015-05-06 21:26 ` David du Colombier 2015-05-06 21:28 ` David du Colombier 2015-05-07 3:38 ` erik quanstrom 0 siblings, 2 replies; 52+ messages in thread From: David du Colombier @ 2015-05-06 21:26 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs I got it! The regression was caused by the NewReno TCP change on 2013-01-24. https://github.com/0intro/plan9/commit/e8406a2f44 -- David du Colombier ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-06 21:26 ` David du Colombier @ 2015-05-06 21:28 ` David du Colombier 2015-05-06 22:28 ` Charles Forsyth 2015-05-06 22:35 ` Steven Stallion 2015-05-07 3:38 ` erik quanstrom 1 sibling, 2 replies; 52+ messages in thread From: David du Colombier @ 2015-05-06 21:28 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Since the problem only happen when Fossil or vacfs are running on the same machine as Venti, I suppose this is somewhat related to how TCP behaves with the loopback. -- David du Colombier ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-06 21:28 ` David du Colombier @ 2015-05-06 22:28 ` Charles Forsyth 2015-05-07 3:35 ` erik quanstrom 2015-05-06 22:35 ` Steven Stallion 1 sibling, 1 reply; 52+ messages in thread From: Charles Forsyth @ 2015-05-06 22:28 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 616 bytes --] On 6 May 2015 at 22:28, David du Colombier <0intro@gmail.com> wrote: > Since the problem only happen when Fossil or vacfs are running > on the same machine as Venti, I suppose this is somewhat related > to how TCP behaves with the loopback. > Interesting. That would explain the clock-like delays. Possibly it's nearly zero RTT in initial exchanges and then when venti has to do some work, things time out. You'd think it would only lead to needless retransmissions not increased latency but perhaps some calculation doesn't work properly with tiny values, causing one side to back off incorrectly. [-- Attachment #2: Type: text/html, Size: 1091 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-06 22:28 ` Charles Forsyth @ 2015-05-07 3:35 ` erik quanstrom 2015-05-07 6:15 ` David du Colombier 0 siblings, 1 reply; 52+ messages in thread From: erik quanstrom @ 2015-05-07 3:35 UTC (permalink / raw) To: 9fans On Wed May 6 15:30:24 PDT 2015, charles.forsyth@gmail.com wrote: > On 6 May 2015 at 22:28, David du Colombier <0intro@gmail.com> wrote: > > > Since the problem only happen when Fossil or vacfs are running > > on the same machine as Venti, I suppose this is somewhat related > > to how TCP behaves with the loopback. > > > > Interesting. That would explain the clock-like delays. > Possibly it's nearly zero RTT in initial exchanges and then when venti has > to do some work, > things time out. You'd think it would only lead to needless retransmissions > not increased latency > but perhaps some calculation doesn't work properly with tiny values, > causing one side to back off > incorrectly. i don't think that's possible. NOW is defined as MACHP(0)->ticks, so this is a pretty course timer that can't go backwards on intel processors. this limits the timer's resolution to HZ, which on 9atom is 1000, and 100 on pretty much anything else. further limiting the resolution is the tcp retransmit timers which according to presotto are /* bounded twixt 0.3 and 64 seconds */ so i really doubt the retransmit timers are resending anything. if someone has a system that isn't working right, please post /net/tcp/<connectionno>/^(local remote status) i'd like to have a look. quoting steve stallion ,,, > > Definitely interesting, and explains why I've never seen the regression (I > > switched to a dedicated venti server a couple of years ago). Were these the > > changes that erik submitted? ISTR him working on reno bits somewhere around > > there... > > I don't think so. Someone else submitted a different set of tcp changes > independently much earlier. just for the record, the earlier changes were an incorrect partial implementation of reno. i implemented newreno from the specs and added corrected window scaling and removed the problem of window slamming. we spent a month going over cases from 50µs to 100ms rtt latency and showed that we got near the theoretical max for all those cases. (big thanks to bruce wong for putting up with early, buggy versions.) during the investigation of this i found that loopback *is* slow for reasons i don't completely understand. part of this was the terrible scheduler. as part of the gsoc work, we were able to make the nix scheduler not howlingly terrible for 1-8 cpus. this improvement depends on the goodness of mcs locks. i developed a version of this, but ended up using charles' much cleaner version. there remain big problems with the tcp and ip stack. it's really slow. i can't get >400MB/s on ethernet. it seems that the 3-way interaction between tcp:tx, tcp:rx and the user-space queues is the issue. queue locking is very wasteful as well. i have some student code that addresses part of the latter problem, but it smells to me like ip/tcp.c's direct calls between tx and rx are the real issue. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-07 3:35 ` erik quanstrom @ 2015-05-07 6:15 ` David du Colombier 2015-05-07 13:17 ` erik quanstrom 0 siblings, 1 reply; 52+ messages in thread From: David du Colombier @ 2015-05-07 6:15 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs > NOW is defined as MACHP(0)->ticks, so this is a pretty course timer > that can't go backwards on intel processors. this limits the timer's resolution to HZ, > which on 9atom is 1000, and 100 on pretty much anything else. further limiting the > resolution is the tcp retransmit timers which according to presotto are > /* bounded twixt 0.3 and 64 seconds */ > so i really doubt the retransmit timers are resending anything. if someone > has a system that isn't working right, please post /net/tcp/<connectionno>/^(local remote status) > i'd like to have a look. The Venti listenner: cpu% cat /net/tcp/2/local ::!17034 cpu% cat /net/tcp/2/remote ::!0 cpu% cat /net/tcp/2/status Listen qin 0 qout 0 rq 0.0 srtt 4000 mdev 0 sst 65535 cwin 1460 swin 0>>0 rwin 65535>>0 qscale 0 timer.start 10 timer.count 0 rerecv 0 katimer.start 2400 katimer.count 0 The TCP connection from Fossil to Venti on the loopback: cpu% cat /net/tcp/3/local 127.0.0.1!57796 cpu% cat /net/tcp/3/remote 127.0.0.1!17034 cpu% cat /net/tcp/3/status Established qin 0 qout 0 rq 0.0 srtt 80 mdev 40 sst 1048560 cwin 258192 swin 1048560>>4 rwin 1048560>>4 qscale 4 timer.start 10 timer.count 10 rerecv 0 katimer.start 2400 katimer.count 427 -- David du Colombier ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-07 6:15 ` David du Colombier @ 2015-05-07 13:17 ` erik quanstrom 2015-05-08 16:13 ` David du Colombier 0 siblings, 1 reply; 52+ messages in thread From: erik quanstrom @ 2015-05-07 13:17 UTC (permalink / raw) To: 9fans > cpu% cat /net/tcp/3/local > 127.0.0.1!57796 > cpu% cat /net/tcp/3/remote > 127.0.0.1!17034 > cpu% cat /net/tcp/3/status > Established qin 0 qout 0 rq 0.0 srtt 80 mdev 40 sst 1048560 cwin > 258192 swin 1048560>>4 rwin 1048560>>4 qscale 4 timer.start 10 > timer.count 10 rerecv 0 katimer.start 2400 katimer.count 427 hmm... large rtt, which suggests that someone is not servicing the queues fast enough. this is for the 1gbe machine in the room with me 11/status:Established qin 0 qout 0 rq 0.0 srtt 0 mdev 0 sst 2920 cwin 61390 swin 1048560>>4 rwin 1048560>>4 qscale 4 timer.start 10 timer.count 10 rerecv 0 katimer.start 2400 katimer.count 2101 i would suggest turning on netlog for tcp while booting and caputing the output. sorry for the short investigation. gotta run. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-07 13:17 ` erik quanstrom @ 2015-05-08 16:13 ` David du Colombier 2015-05-08 16:39 ` Charles Forsyth 0 siblings, 1 reply; 52+ messages in thread From: David du Colombier @ 2015-05-08 16:13 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs I've enabled tcp, tcpwin and tcprxmt logs, but there isn't anything very interesting. tcpincoming s 127.0.0.1!53150/127.0.0.1!53150 d 127.0.0.1!17034/127.0.0.1!17034 v 4/4 Also, the issue is definitely related to the loopback. There is no problem when using an address on /dev/ether0. cpu% cat /net/tcp/3/local 192.168.0.100!43125 cpu% cat /net/tcp/3/remote 192.168.0.100!17034 cpu% cat /net/tcp/3/status Established qin 0 qout 0 rq 0.0 srtt 0 mdev 0 sst 1048560 cwin 396560 swin 1048560>>4 rwin 1048560>>4 qscale 4 timer.start 10 timer.count 10 rerecv 0 katimer.start 2400 katimer.count 2106 -- David du Colombier ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-08 16:13 ` David du Colombier @ 2015-05-08 16:39 ` Charles Forsyth 2015-05-08 17:16 ` David du Colombier 0 siblings, 1 reply; 52+ messages in thread From: Charles Forsyth @ 2015-05-08 16:39 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 353 bytes --] On 8 May 2015 at 17:13, David du Colombier <0intro@gmail.com> wrote: > Also, the issue is definitely related to the loopback. > There is no problem when using an address on /dev/ether0. > oh. possibly the queue isn't big enough, given the window size. it's using qpass on a Queue with Qmsg and if the queue is full, Blocks will be discarded. [-- Attachment #2: Type: text/html, Size: 727 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-08 16:39 ` Charles Forsyth @ 2015-05-08 17:16 ` David du Colombier 2015-05-08 19:24 ` David du Colombier 0 siblings, 1 reply; 52+ messages in thread From: David du Colombier @ 2015-05-08 17:16 UTC (permalink / raw) To: 9fans > oh. possibly the queue isn't big enough, given the window size. > it's using qpass on a Queue with Qmsg and if the queue is full, > Blocks will be discarded. I tried to increase the size of the queue, but no luck. -- David du Colombier ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-08 17:16 ` David du Colombier @ 2015-05-08 19:24 ` David du Colombier 2015-05-08 20:03 ` Steve Simon ` (2 more replies) 0 siblings, 3 replies; 52+ messages in thread From: David du Colombier @ 2015-05-08 19:24 UTC (permalink / raw) To: 9fans I've finally figured out the issue. The slowness issue only appears on the loopback, because it provides a 16384 MTU. There is an old bug in the Plan 9 TCP stack, were the TCP MSS doesn't take account the MTU for incoming connections. I originally fixed this issue in January 2015 for the Plan 9 port on Google Compute Engine. On GCE, there is an unusual 1460 MTU. The Plan 9 TCP stack defines a default 1460 MSS corresponding to a 1500 MTU. Then, the MSS is fixed according to the MTU for outgoing connections, but not incoming connections. On GCE, this issue leads to IP fragmentation, but GCE didn't handle IP fragmentation properly, so the connections were dropped. On the loopback medium, I suppose this is the opposite issue. Since the TCP stack didn't fix the MSS in the incoming connection, the programs sent multiple small 1500 bytes IP packets instead of large 16384 IP packets, but I don't know why it leads to such a slowdown. Here is the patch for the Plan 9 kernel: http://9legacy.org/9legacy/patch/9-tcp-mss.diff And Charles' 9k kernel: http://9legacy.org/9legacy/patch/9k-tcp-mss.diff -- David du Colombier ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-08 19:24 ` David du Colombier @ 2015-05-08 20:03 ` Steve Simon 2015-05-08 21:19 ` Bakul Shah 2015-05-09 3:11 ` cinap_lenrek 2 siblings, 0 replies; 52+ messages in thread From: Steve Simon @ 2015-05-08 20:03 UTC (permalink / raw) To: 9fans I confirm - my old performance is back. Thanks very much David. -Steve ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-08 19:24 ` David du Colombier 2015-05-08 20:03 ` Steve Simon @ 2015-05-08 21:19 ` Bakul Shah 2015-05-09 14:43 ` erik quanstrom 2015-05-09 3:11 ` cinap_lenrek 2 siblings, 1 reply; 52+ messages in thread From: Bakul Shah @ 2015-05-08 21:19 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Fri, 08 May 2015 21:24:13 +0200 David du Colombier <0intro@gmail.com> wrote: > On the loopback medium, I suppose this is the opposite issue. > Since the TCP stack didn't fix the MSS in the incoming > connection, the programs sent multiple small 1500 bytes > IP packets instead of large 16384 IP packets, but I don't > know why it leads to such a slowdown. Looking at the first few bytes in each dir of the initial TCP handshake (with tcpdump) I see: 0x0000: 4500 0030 24da 0000 <= from plan9 to freebsd 0x0000: 4500 0030 d249 4000 <= from freebsd to plan9 Looks like FreeBSD always sets the DF (don't fragment) bit (0x40 in byte 6), while plan9 doesn't (byte 6 is 0x00). May be plan9 should set the DF (don't fragment) bit in the IP header and try to do path MTU discovery? Either by default or under some ctl option. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-08 21:19 ` Bakul Shah @ 2015-05-09 14:43 ` erik quanstrom 2015-05-09 17:25 ` Lyndon Nerenberg 0 siblings, 1 reply; 52+ messages in thread From: erik quanstrom @ 2015-05-09 14:43 UTC (permalink / raw) To: 9fans > Looking at the first few bytes in each dir of the initial TCP > handshake (with tcpdump) I see: > > 0x0000: 4500 0030 24da 0000 <= from plan9 to freebsd > > 0x0000: 4500 0030 d249 4000 <= from freebsd to plan9 > > Looks like FreeBSD always sets the DF (don't fragment) bit > (0x40 in byte 6), while plan9 doesn't (byte 6 is 0x00). > > May be plan9 should set the DF (don't fragment) bit in the IP > header and try to do path MTU discovery? Either by default or > under some ctl option. easy enough until one encounters devices that don't send icmp responses because it's not implemented, or somehow considered "secure" that way. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-09 14:43 ` erik quanstrom @ 2015-05-09 17:25 ` Lyndon Nerenberg 2015-05-09 17:30 ` Devon H. O'Dell 2015-05-09 18:20 ` Bakul Shah 0 siblings, 2 replies; 52+ messages in thread From: Lyndon Nerenberg @ 2015-05-09 17:25 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 659 bytes --] On May 9, 2015, at 7:43 AM, erik quanstrom <quanstro@quanstro.net> wrote: > easy enough until one encounters devices that don't send icmp > responses because it's not implemented, or somehow considered > "secure" that way. Oddly enough, I don't see this 'problem' in the real world. And FreeBSD is far from being alone in the always-set-DF bit. The only place this bites is when you run into tiny shops with homegrown firewalls configured by people who don't understand networking or security. Me, I consider it a feature that these sites self-select themselves off the network. I'm certainly no worse off for not being able to talk to them. [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 817 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-09 17:25 ` Lyndon Nerenberg @ 2015-05-09 17:30 ` Devon H. O'Dell 2015-05-09 17:35 ` Lyndon Nerenberg 2015-05-09 18:20 ` Bakul Shah 1 sibling, 1 reply; 52+ messages in thread From: Devon H. O'Dell @ 2015-05-09 17:30 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs 2015-05-09 10:25 GMT-07:00 Lyndon Nerenberg <lyndon@orthanc.ca>: > > > On May 9, 2015, at 7:43 AM, erik quanstrom <quanstro@quanstro.net> wrote: > > > easy enough until one encounters devices that don't send icmp > > responses because it's not implemented, or somehow considered > > "secure" that way. > > Oddly enough, I don't see this 'problem' in the real world. And FreeBSD is far from being alone in the always-set-DF bit. > > The only place this bites is when you run into tiny shops with homegrown firewalls configured by people who don't understand networking or security. Me, I consider it a feature that these sites self-select themselves off the network. I'm certainly no worse off for not being able to talk to them. Or when your client is on a cell phone. Cell networks are the worst. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-09 17:30 ` Devon H. O'Dell @ 2015-05-09 17:35 ` Lyndon Nerenberg 2015-05-09 21:54 ` Devon H. O'Dell 0 siblings, 1 reply; 52+ messages in thread From: Lyndon Nerenberg @ 2015-05-09 17:35 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 314 bytes --] On May 9, 2015, at 10:30 AM, Devon H. O'Dell <devon.odell@gmail.com> wrote: > Or when your client is on a cell phone. Cell networks are the worst. Really? Quite often I slave my laptop to my phone's LTE connection, and I never have problems with PMTU. Both here (across western Canada) and in the UK. [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 817 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-09 17:35 ` Lyndon Nerenberg @ 2015-05-09 21:54 ` Devon H. O'Dell 0 siblings, 0 replies; 52+ messages in thread From: Devon H. O'Dell @ 2015-05-09 21:54 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs 2015-05-09 10:35 GMT-07:00 Lyndon Nerenberg <lyndon@orthanc.ca>: > > On May 9, 2015, at 10:30 AM, Devon H. O'Dell <devon.odell@gmail.com> wrote: > >> Or when your client is on a cell phone. Cell networks are the worst. > > Really? Quite often I slave my laptop to my phone's LTE connection, and I never have problems with PMTU. Both here (across western Canada) and in the UK. There are lots of hacks all over the Internet to deal with various brokenness on the carrier<->carrier side of things where one end is a cell network. Haven't seen anything come up super recently, but had to help debug some brokenness as recently as a year and a half ago that turned out to be some cell network with really old hardware that didn't do PMTU correctly, causing TLS connections to drop or die. IIRC this particular case was in France, but I also seem to recall the same issue in northern England and perhaps Ireland. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-09 17:25 ` Lyndon Nerenberg 2015-05-09 17:30 ` Devon H. O'Dell @ 2015-05-09 18:20 ` Bakul Shah 1 sibling, 0 replies; 52+ messages in thread From: Bakul Shah @ 2015-05-09 18:20 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs > On May 9, 2015, at 10:25 AM, Lyndon Nerenberg <lyndon@orthanc.ca> wrote: > > >> On May 9, 2015, at 7:43 AM, erik quanstrom <quanstro@quanstro.net> wrote: >> >> easy enough until one encounters devices that don't send icmp >> responses because it's not implemented, or somehow considered >> "secure" that way. > > Oddly enough, I don't see this 'problem' in the real world. And FreeBSD is far from being alone in the always-set-DF bit. > > The only place this bites is when you run into tiny shops with homegrown firewalls configured by people who don't understand networking or security. Me, I consider it a feature that these sites self-select themselves off the network. I'm certainly no worse off for not being able to talk to them. Network admins not understanding ICMP was far more common 20 years ago. Now the game has changed. At any rate no harm in trying PMTU discovery as an option (other than a SMOP). ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-08 19:24 ` David du Colombier 2015-05-08 20:03 ` Steve Simon 2015-05-08 21:19 ` Bakul Shah @ 2015-05-09 3:11 ` cinap_lenrek 2015-05-09 5:59 ` lucio ` (2 more replies) 2 siblings, 3 replies; 52+ messages in thread From: cinap_lenrek @ 2015-05-09 3:11 UTC (permalink / raw) To: 9fans do we really need to initialize tcb->mss to tcpmtu() in procsyn()? as i see it, procsyn() is called only when tcb->state is Syn_sent, which only should happen for client connections doing a connect, in which case tcpsndsyn() would have initialized tcb->mss already no? -- cinap ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-09 3:11 ` cinap_lenrek @ 2015-05-09 5:59 ` lucio 2015-05-09 16:26 ` cinap_lenrek 2015-05-09 16:23 ` erik quanstrom 2015-05-09 16:59 ` erik quanstrom 2 siblings, 1 reply; 52+ messages in thread From: lucio @ 2015-05-09 5:59 UTC (permalink / raw) To: 9fans > do we really need to initialize tcb->mss to tcpmtu() in procsyn()? > as i see it, procsyn() is called only when tcb->state is Syn_sent, > which only should happen for client connections doing a connect, in > which case tcpsndsyn() would have initialized tcb->mss already no? tcb->mss may still need to be adjusted at this point, as it is when /* our sending max segment size cannot be bigger than what he asked for */ so at worst this does no harm that I can see. Of course, I'm probably least qualified to pick these nits. Lucio. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-09 5:59 ` lucio @ 2015-05-09 16:26 ` cinap_lenrek 0 siblings, 0 replies; 52+ messages in thread From: cinap_lenrek @ 2015-05-09 16:26 UTC (permalink / raw) To: 9fans yes, but i was not refering to the adjusting which isnt changed here. only the tcpmtu() call that got added. yes, it *should* not make any difference but maybe we'r missing something. at worst it makes the code more confusing and cause bugs in the future because one of the initializations of mss is a lie without any effect. -- cinap ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-09 3:11 ` cinap_lenrek 2015-05-09 5:59 ` lucio @ 2015-05-09 16:23 ` erik quanstrom 2015-05-10 4:55 ` erik quanstrom 2015-05-10 20:19 ` cinap_lenrek 2015-05-09 16:59 ` erik quanstrom 2 siblings, 2 replies; 52+ messages in thread From: erik quanstrom @ 2015-05-09 16:23 UTC (permalink / raw) To: 9fans On Fri May 8 20:12:57 PDT 2015, cinap_lenrek@felloff.net wrote: > do we really need to initialize tcb->mss to tcpmtu() in procsyn()? > as i see it, procsyn() is called only when tcb->state is Syn_sent, > which only should happen for client connections doing a connect, in > which case tcpsndsyn() would have initialized tcb->mss already no? i think there was a subtile reason for this, but i don't recall. a real reason for setting it here is because it makes the code easier to reason about, imo. there are a couple problems with the patch as it stands. they are inherited from previous mistakes. * the setting of tpriv->stats[Mss] is bogus. it's not shared between connections. it is also v4 only. * so, mss should be added to each tcp connection's status file. * the setting of tcb->mss in tcpincoming is not correct, tcp->mss is set by SYN, not by ACK, and may not be reset. (see snoopy below.) * the SYN-ACK needs to send the local mss, not echo the remote mss. asymmetry is "fine" in the other side, even if ip/tcp.c isn't smart enough to keep tx and rx mss seperate. (scare quotes = untested, there may be some performance niggles if the sender is sending legal packets larger than tcb->mss.) my patch to nix is below. i haven't submitted it yet. - erik --- 005319 ms ether(s=a0369f1c3af7 d=0cc47a328da4 pr=0800 ln=62) ip(s=10.1.1.8 d=10.1.1.9 id=ee54 frag=0000 ttl=255 pr=6 ln=48) tcp(s=38903 d=17766 seq=3552109414 ack=0 fl=S win=65535 ck=d68e ln=0 opt4=(mss 1460) opt3=(wscale 4) opt=NOOP) 005320 ms ether(s=0cc47a328da4 d=a0369f1c3af7 pr=0800 ln=62) ip(s=10.1.1.9 d=10.1.1.8 id=54d3 frag=0000 ttl=255 pr=6 ln=48) tcp(s=17766 d=38903 seq=441373010 ack=3552109415 fl=AS win=65535 ck=eadc ln=0 opt4=(mss 1460) opt3=(wscale 4) opt=NOOP) --- /n/dump/2015/0509/sys/src/nix/ip/tcp.c:491,501 - /sys/src/nix/ip/tcp.c:491,502 s = (Tcpctl*)(c->ptcl); return snprint(state, n, - "%s qin %d qout %d rq %d.%d srtt %d mdev %d sst %lud cwin %lud swin %lud>>%d rwin %lud>>%d qscale %d timer.start %d timer.count %d rerecv %d katimer.start %d katimer.count %d\n", + "%s qin %d qout %d rq %d.%d mss %d srtt %d mdev %d sst %lud cwin %lud swin %lud>>%d rwin %lud>>%d qscale %d timer.start %d timer.count %d rerecv %d katimer.start %d katimer.count %d\n", tcpstates[s->state], c->rq ? qlen(c->rq) : 0, c->wq ? qlen(c->wq) : 0, s->nreseq, s->reseqlen, + s->mss, s->srtt, s->mdev, s->ssthresh, s->cwind, s->snd.wnd, s->rcv.scale, s->rcv.wnd, s->snd.scale, s->qscale, /n/dump/2015/0509/sys/src/nix/ip/tcp.c:843,854 - /sys/src/nix/ip/tcp.c:844,857 /* mtu (- TCP + IP hdr len) of 1st hop */ static int - tcpmtu(Proto *tcp, uchar *addr, int version, uint *scale) + tcpmtu(Proto *tcp, uchar *addr, int version, uint reqmss, uint *scale) { + Tcppriv *tpriv; Ipifc *ifc; int mtu; ifc = findipifc(tcp->f, addr, 0); + tpriv = tcp->priv; switch(version){ default: case V4: /n/dump/2015/0509/sys/src/nix/ip/tcp.c:855,865 - /sys/src/nix/ip/tcp.c:858,870 mtu = DEF_MSS; if(ifc != nil) mtu = ifc->maxtu - ifc->m->hsize - (TCP4_PKT + TCP4_HDRSIZE); + tpriv->stats[Mss] = mtu; break; case V6: mtu = DEF_MSS6; if(ifc != nil) mtu = ifc->maxtu - ifc->m->hsize - (TCP6_PKT + TCP6_HDRSIZE); + tpriv->stats[Mss] = mtu + (TCP6_PKT + TCP6_HDRSIZE) - (TCP4_PKT + TCP4_HDRSIZE); break; } /* /n/dump/2015/0509/sys/src/nix/ip/tcp.c:868,873 - /sys/src/nix/ip/tcp.c:873,882 */ *scale = Defadvscale; + /* our sending max segment size cannot be bigger than what he asked for */ + if(reqmss != 0 && reqmss < mtu) + mtu = reqmss; + return mtu; } /n/dump/2015/0509/sys/src/nix/ip/tcp.c:1300,1307 - /sys/src/nix/ip/tcp.c:1309,1314 static void tcpsndsyn(Conv *s, Tcpctl *tcb) { - Tcppriv *tpriv; - tcb->iss = (nrand(1<<16)<<16)|nrand(1<<16); tcb->rttseq = tcb->iss; tcb->snd.wl2 = tcb->iss; /n/dump/2015/0509/sys/src/nix/ip/tcp.c:1314,1322 - /sys/src/nix/ip/tcp.c:1321,1327 tcb->sndsyntime = NOW; /* set desired mss and scale */ - tcb->mss = tcpmtu(s->p, s->laddr, s->ipversion, &tcb->scale); - tpriv = s->p->priv; - tpriv->stats[Mss] = tcb->mss; + tcb->mss = tcpmtu(s->p, s->laddr, s->ipversion, 0, &tcb->scale); } void /n/dump/2015/0509/sys/src/nix/ip/tcp.c:1492,1498 - /sys/src/nix/ip/tcp.c:1497,1503 seg.ack = lp->irs+1; seg.flags = SYN|ACK; seg.urg = 0; - seg.mss = tcpmtu(tcp, lp->laddr, lp->version, &scale); + seg.mss = tcpmtu(tcp, lp->laddr, lp->version, 0, &scale); /* send our mss, not lp->mss */ seg.wnd = QMAX; /* if the other side set scale, we should too */ /n/dump/2015/0509/sys/src/nix/ip/tcp.c:1767,1777 - /sys/src/nix/ip/tcp.c:1772,1779 tcb->flgcnt = 0; tcb->flags |= SYNACK; - /* our sending max segment size cannot be bigger than what he asked for */ - if(lp->mss != 0 && lp->mss < tcb->mss) { - tcb->mss = lp->mss; - tpriv->stats[Mss] = tcb->mss; - } + /* per rfc, we can't set the mss any more */ + // tcb->mss = tcpmtu(s->p, lp->laddr, lp->version, lp->mss, &tcb->scale); /* window scaling */ tcpsetscale(new, tcb, lp->rcvscale, lp->sndscale); /n/dump/2015/0509/sys/src/nix/ip/tcp.c:3014,3020 - /sys/src/nix/ip/tcp.c:3016,3021 procsyn(Conv *s, Tcp *seg) { Tcpctl *tcb; - Tcppriv *tpriv; tcb = (Tcpctl*)s->ptcl; tcb->flags |= FORCE; /n/dump/2015/0509/sys/src/nix/ip/tcp.c:3026,3036 - /sys/src/nix/ip/tcp.c:3027,3033 tcb->irs = seg->seq; /* our sending max segment size cannot be bigger than what he asked for */ - if(seg->mss != 0 && seg->mss < tcb->mss) { - tcb->mss = seg->mss; - tpriv = s->p->priv; - tpriv->stats[Mss] = tcb->mss; - } + tcb->mss = tcpmtu(s->p, s->laddr, s->ipversion, seg->mss, &tcb->scale); tcb->snd.wnd = seg->wnd; initialwindow(tcb); ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-09 16:23 ` erik quanstrom @ 2015-05-10 4:55 ` erik quanstrom 2015-05-10 5:07 ` erik quanstrom 2015-05-10 20:19 ` cinap_lenrek 1 sibling, 1 reply; 52+ messages in thread From: erik quanstrom @ 2015-05-10 4:55 UTC (permalink / raw) To: 9fans for what it's worth, the original newreno work tcp does not have the mtu bug. on a 8 processor system i have around here i get bwc; while() nettest -a 127.1 tcp!127.0.0.1!40357 count 100000; 819200000 bytes in 1.505948 s @ 519 MB/s (0ms) tcp!127.0.0.1!47983 count 100000; 819200000 bytes in 1.377984 s @ 567 MB/s (0ms) tcp!127.0.0.1!53197 count 100000; 819200000 bytes in 1.299967 s @ 601 MB/s (0ms) tcp!127.0.0.1!61569 count 100000; 819200000 bytes in 1.418073 s @ 551 MB/s (0ms) however, after fixing things so the initial cwind isn't hosed, i get a little better story: bwc; while() nettest -a 127.1 tcp!127.0.0.1!54261 count 100000; 819200000 bytes in .5947659 s @ 1.31e+03 MB/s (0ms) boo yah! not bad for trying to clean up some constants. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-10 4:55 ` erik quanstrom @ 2015-05-10 5:07 ` erik quanstrom 2015-05-10 17:57 ` David du Colombier 0 siblings, 1 reply; 52+ messages in thread From: erik quanstrom @ 2015-05-10 5:07 UTC (permalink / raw) To: 9fans > however, after fixing things so the initial cwind isn't hosed, i get a little better story: so, actually, i think this is the root cause. the intial cwind is misset for loopback. i but that the symptom folks will see is that /net/tcp/stats shows fragmentation when performance sucks. evidently there is a backoff bug in sources' tcp, too. i'd love confirmation of this. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-10 5:07 ` erik quanstrom @ 2015-05-10 17:57 ` David du Colombier 2015-05-10 20:18 ` erik quanstrom 0 siblings, 1 reply; 52+ messages in thread From: David du Colombier @ 2015-05-10 17:57 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs >> however, after fixing things so the initial cwind isn't hosed, i get a little better story: > > so, actually, i think this is the root cause. the intial cwind is misset for loopback. > i but that the symptom folks will see is that /net/tcp/stats shows fragmentation when > performance sucks. evidently there is a backoff bug in sources' tcp, too. What is your cwind change? -- David du Colombier ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-10 17:57 ` David du Colombier @ 2015-05-10 20:18 ` erik quanstrom 0 siblings, 0 replies; 52+ messages in thread From: erik quanstrom @ 2015-05-10 20:18 UTC (permalink / raw) To: 9fans On Sun May 10 10:58:55 PDT 2015, 0intro@gmail.com wrote: > >> however, after fixing things so the initial cwind isn't hosed, i get a little better story: > > > > so, actually, i think this is the root cause. the intial cwind is misset for loopback. > > i but that the symptom folks will see is that /net/tcp/stats shows fragmentation when > > performance sucks. evidently there is a backoff bug in sources' tcp, too. > > What is your cwind change? > the patch is here: /n/atom/patch/tcpmss note i applied a patch to nettest(8) to simulate a rpc-style protocol. i still ~500MB/s with my test machine simulating rpc-style transactions, or 15µs per 8k transaction. we're at least an order of magnitude off the performance mark for this. a similar test using pipe(2) shows a latency of 5.7µs (!) for a pipe-based rpc, which limits us to about 1.4 GB/s for 8k pipe-based ping-poing rpc. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-09 16:23 ` erik quanstrom 2015-05-10 4:55 ` erik quanstrom @ 2015-05-10 20:19 ` cinap_lenrek 2015-05-10 20:51 ` erik quanstrom 1 sibling, 1 reply; 52+ messages in thread From: cinap_lenrek @ 2015-05-10 20:19 UTC (permalink / raw) To: 9fans > * the SYN-ACK needs to send the local mss, not echo the remote mss. > asymmetry is "fine" in the other side, even if ip/tcp.c isn't smart enough to > keep tx and rx mss seperate. (scare quotes = untested, there may be > some performance niggles if the sender is sending legal packets larger than > tcb->mss.) that is what it already does as far as i can see. on the server side, we receive a SYN, put it in limbo and reply with SYN|ACK (sndsynack()) sending our local mss straight from tcpmtu(), no adjust. at this point heres no connection or tcb as everything is still in limbo. only once we receive the ACK, tcpincoming() gets called which pulls the info we got so far (including the mss sent by the client in the SYN pakcet) out of limbo and sets up a connection with its tcb. to summarize what happens on the server for incoming connection: 1.a) tcpiput() gets a SYN packet for Listening connection, calls limbo(). 1.b) limbo() saves the info (including mss) from SYN in limbo database and calls sndsynack(). 1.c) sndsynack() sends SYN|ACK packet with mss option set from tcpmtu() without any adjust. 2.a) tcpiput() gets a ACK packet for Listening connection, calls tcpincoming(). 2.b) tcpincoming() looks in limbo, finds lp. and makes new connection. 3.c) initialize our connections tcb->mss. > * the setting of tcb->mss in tcpincoming is not correct, tcp->mss is > set by SYN, not by ACK, and may not be reset. (see snoopy below.) you say we shouldnt initialize tcb->mss in 3.c and not use the mss from the initial SYN to adjust it. i dont understand why not as i dont see where it would be initialized otherwise. it appears that was what the initial patch from david was about to fix which made sense to me. as far as i can see, the procsyn() is unrelated to server side incoming connections. it only gets called on behalf of a client outgoing connect when the connection is in Syn_sent state and processes the SYN|ACK that was generated by the process descibed in 1.c above. -- cinap ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-10 20:19 ` cinap_lenrek @ 2015-05-10 20:51 ` erik quanstrom 2015-05-10 21:34 ` cinap_lenrek 0 siblings, 1 reply; 52+ messages in thread From: erik quanstrom @ 2015-05-10 20:51 UTC (permalink / raw) To: 9fans > 2.a) tcpiput() gets a ACK packet for Listening connection, calls tcpincoming(). > 2.b) tcpincoming() looks in limbo, finds lp. and makes new connection. > 3.c) initialize our connections tcb->mss. > > > * the setting of tcb->mss in tcpincoming is not correct, tcp->mss is > > set by SYN, not by ACK, and may not be reset. (see snoopy below.) > > you say we shouldnt initialize tcb->mss in 3.c and not use the mss from the > initial SYN to adjust it. i dont understand why not as i dont see where it > would be initialized otherwise. it appears that was what the initial patch > from david was about to fix which made sense to me. that was the opposite of what i was saying. the issue was i misread tcpincoming(). - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-10 20:51 ` erik quanstrom @ 2015-05-10 21:34 ` cinap_lenrek 2015-05-11 1:23 ` erik quanstrom 0 siblings, 1 reply; 52+ messages in thread From: cinap_lenrek @ 2015-05-10 21:34 UTC (permalink / raw) To: 9fans how is this the opposite? your patch shows the tcb->mss init being removed completely from tcpincoming(). - /* our sending max segment size cannot be bigger than what he asked for */ - if(lp->mss != 0 && lp->mss < tcb->mss) { - tcb->mss = lp->mss; - tpriv->stats[Mss] = tcb->mss; - } + /* per rfc, we can't set the mss any more */ + // tcb->mss = tcpmtu(s->p, lp->laddr, lp->version, lp->mss, &tcb->scale); -- cinap ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-10 21:34 ` cinap_lenrek @ 2015-05-11 1:23 ` erik quanstrom 0 siblings, 0 replies; 52+ messages in thread From: erik quanstrom @ 2015-05-11 1:23 UTC (permalink / raw) To: 9fans On Sun May 10 14:36:15 PDT 2015, cinap_lenrek@felloff.net wrote: > how is this the opposite? your patch shows the tcb->mss init being removed > completely from tcpincoming(). > > - /* our sending max segment size cannot be bigger than what he asked for */ > - if(lp->mss != 0 && lp->mss < tcb->mss) { > - tcb->mss = lp->mss; > - tpriv->stats[Mss] = tcb->mss; > - } > + /* per rfc, we can't set the mss any more */ > + // tcb->mss = tcpmtu(s->p, lp->laddr, lp->version, lp->mss, &tcb->scale); i haven't updated the patch. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-09 3:11 ` cinap_lenrek 2015-05-09 5:59 ` lucio 2015-05-09 16:23 ` erik quanstrom @ 2015-05-09 16:59 ` erik quanstrom 2 siblings, 0 replies; 52+ messages in thread From: erik quanstrom @ 2015-05-09 16:59 UTC (permalink / raw) To: 9fans On Fri May 8 20:12:57 PDT 2015, cinap_lenrek@felloff.net wrote: > do we really need to initialize tcb->mss to tcpmtu() in procsyn()? > as i see it, procsyn() is called only when tcb->state is Syn_sent, > which only should happen for client connections doing a connect, in > which case tcpsndsyn() would have initialized tcb->mss already no? yes, we should. the bug is that we confuse send mss and receive mss. the sender's mss is the one we need to repsect here. tcpsendsyn() should not set the mss, the mss it calculates is for rx. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-06 21:28 ` David du Colombier 2015-05-06 22:28 ` Charles Forsyth @ 2015-05-06 22:35 ` Steven Stallion 2015-05-06 23:47 ` Charles Forsyth 1 sibling, 1 reply; 52+ messages in thread From: Steven Stallion @ 2015-05-06 22:35 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 534 bytes --] Definitely interesting, and explains why I've never seen the regression (I switched to a dedicated venti server a couple of years ago). Were these the changes that erik submitted? ISTR him working on reno bits somewhere around there... On Wed, May 6, 2015 at 4:28 PM, David du Colombier <0intro@gmail.com> wrote: > Since the problem only happen when Fossil or vacfs are running > on the same machine as Venti, I suppose this is somewhat related > to how TCP behaves with the loopback. > > -- > David du Colombier > > [-- Attachment #2: Type: text/html, Size: 892 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-06 22:35 ` Steven Stallion @ 2015-05-06 23:47 ` Charles Forsyth 0 siblings, 0 replies; 52+ messages in thread From: Charles Forsyth @ 2015-05-06 23:47 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 225 bytes --] On 6 May 2015 at 23:35, Steven Stallion <sstallion@gmail.com> wrote: > Were these the changes that erik submitted? I don't think so. Someone else submitted a different set of tcp changes independently much earlier. [-- Attachment #2: Type: text/html, Size: 512 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-06 21:26 ` David du Colombier 2015-05-06 21:28 ` David du Colombier @ 2015-05-07 3:38 ` erik quanstrom 1 sibling, 0 replies; 52+ messages in thread From: erik quanstrom @ 2015-05-07 3:38 UTC (permalink / raw) To: 9fans On Wed May 6 14:28:03 PDT 2015, 0intro@gmail.com wrote: > I got it! > > The regression was caused by the NewReno TCP > change on 2013-01-24. > > https://github.com/0intro/plan9/commit/e8406a2f44 if you have proof, i'd be interested in reproduction of the issue from the original source, or perhaps just nix. let me know if i can help. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-05 22:53 ` Aram Hăvărneanu 2015-05-06 20:55 ` David du Colombier @ 2015-05-07 3:43 ` erik quanstrom 1 sibling, 0 replies; 52+ messages in thread From: erik quanstrom @ 2015-05-07 3:43 UTC (permalink / raw) To: 9fans On Tue May 5 15:54:45 PDT 2015, aram.h@mgk.ro wrote: > It's pretty interesting that at least three people all got exactly > 150kB/s on vastly different machines, both real and virtual. Maybe the > number comes from some tick frequency? i might suggest altering HZ and seeing if there is a throughput change in the same ratio. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-04 18:11 ` Aram Hăvărneanu 2015-05-04 18:51 ` David du Colombier @ 2015-05-05 15:07 ` KADOTA Kyohei 1 sibling, 0 replies; 52+ messages in thread From: KADOTA Kyohei @ 2015-05-05 15:07 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Thanks Aram. > I have spent some time > debugging this, but unfortunately, I couldn't find the root cause, and > I just stopped using fossil. I tried to measure performance effect by replacement of component. 1) mbr or GRUB 2) pbs or pbslba 3) sdata or sdvirtio (sdvirtio is imported from 9legacy) 4) kernel configurations (9pcf, 9pccpuf, 9pcauth, etc) unfortunately, all of the above are no performance effect. — kadota ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-04 16:10 ` Anthony Sorace 2015-05-04 18:11 ` Aram Hăvărneanu @ 2015-05-05 14:47 ` KADOTA Kyohei 2015-05-05 15:46 ` steve 1 sibling, 1 reply; 52+ messages in thread From: KADOTA Kyohei @ 2015-05-05 14:47 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Thanks Anthony. > I bet if you re-run the same test twice in a > row, you’re going to see dramatically improved > performance. I try to re-run ‘iostats md5sum /386/9pcf’. Read result is very fast. first read result is 152KB/s. second read result is 232MB/s. > Your write performance in that test isn’t really > relevant: they’re not hitting the file system at all. I think to write 1GB data to filesystem: iostats dd -if /dev/zero -of output -ibs 1024k -obs 1024k -count 1024 Write result of dd is 31MB/s. But this test may just write to fossil. It may not write to venti. > I’m not sure why you’d see a difference in a > fossil+venti setup of a different size, but the > partition size relationships, and the in-memory > cache size relationships, are what’s mostly important. My hardware has 2GB memory. Plan 9 configurations are almost default. (except /dev/sdC0/bloom) To increase memory size is difficult, because memory size is determined by public QEMU/KVM service plan. — kadota ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-05 14:47 ` KADOTA Kyohei @ 2015-05-05 15:46 ` steve 2015-05-05 15:54 ` David du Colombier 0 siblings, 1 reply; 52+ messages in thread From: steve @ 2015-05-05 15:46 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs I too see this, and feel, no proof, that things used to be better. I.e. the first time I read a file from venti it it very, very slow. subsequent reads from the ram cache are quick. I think venti used to be faster a few years ago. maybe another effect of this is the boot time seems slower than it used to be. sorry to be vague. -Steve > On 5 May 2015, at 15:47, KADOTA Kyohei <lufia@me.com> wrote: > > Thanks Anthony. > >> I bet if you re-run the same test twice in a >> row, you’re going to see dramatically improved >> performance. > > I try to re-run ‘iostats md5sum /386/9pcf’. > Read result is very fast. > > first read result is 152KB/s. > second read result is 232MB/s. > >> Your write performance in that test isn’t really >> relevant: they’re not hitting the file system at all. > > I think to write 1GB data to filesystem: > > iostats dd -if /dev/zero -of output -ibs 1024k -obs 1024k -count 1024 > > Write result of dd is 31MB/s. > But this test may just write to fossil. It may not write to venti. > >> I’m not sure why you’d see a difference in a >> fossil+venti setup of a different size, but the >> partition size relationships, and the in-memory >> cache size relationships, are what’s mostly important. > > My hardware has 2GB memory. > Plan 9 configurations are almost default. (except /dev/sdC0/bloom) > To increase memory size is difficult, > because memory size is determined by public QEMU/KVM service plan. > > — > kadota ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] fossil+venti performance question 2015-05-05 15:46 ` steve @ 2015-05-05 15:54 ` David du Colombier 0 siblings, 0 replies; 52+ messages in thread From: David du Colombier @ 2015-05-05 15:54 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs > I too see this, and feel, no proof, that things used to be better. I.e. the first time I read a file from venti it it very, very slow. subsequent reads from the ram cache are quick. > > I think venti used to be faster a few years ago. maybe another effect of this is the boot time seems slower than it used to be. > > sorry to be vague. I'm pretty sure this issue started something like two years ago. It looks like a regression somewhere. -- David du Colombier ^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2015-05-11 1:23 UTC | newest] Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-04 9:32 [9fans] fossil+venti performance question KADOTA Kyohei 2015-05-04 16:10 ` Anthony Sorace 2015-05-04 18:11 ` Aram Hăvărneanu 2015-05-04 18:51 ` David du Colombier 2015-05-05 14:29 ` Sergey Zhilkin 2015-05-05 15:05 ` Charles Forsyth 2015-05-05 15:38 ` David du Colombier 2015-05-05 22:23 ` Charles Forsyth 2015-05-05 22:29 ` cinap_lenrek 2015-05-05 22:33 ` David du Colombier 2015-05-05 22:53 ` Aram Hăvărneanu 2015-05-06 20:55 ` David du Colombier 2015-05-06 21:17 ` Charles Forsyth 2015-05-06 21:26 ` David du Colombier 2015-05-06 21:28 ` David du Colombier 2015-05-06 22:28 ` Charles Forsyth 2015-05-07 3:35 ` erik quanstrom 2015-05-07 6:15 ` David du Colombier 2015-05-07 13:17 ` erik quanstrom 2015-05-08 16:13 ` David du Colombier 2015-05-08 16:39 ` Charles Forsyth 2015-05-08 17:16 ` David du Colombier 2015-05-08 19:24 ` David du Colombier 2015-05-08 20:03 ` Steve Simon 2015-05-08 21:19 ` Bakul Shah 2015-05-09 14:43 ` erik quanstrom 2015-05-09 17:25 ` Lyndon Nerenberg 2015-05-09 17:30 ` Devon H. O'Dell 2015-05-09 17:35 ` Lyndon Nerenberg 2015-05-09 21:54 ` Devon H. O'Dell 2015-05-09 18:20 ` Bakul Shah 2015-05-09 3:11 ` cinap_lenrek 2015-05-09 5:59 ` lucio 2015-05-09 16:26 ` cinap_lenrek 2015-05-09 16:23 ` erik quanstrom 2015-05-10 4:55 ` erik quanstrom 2015-05-10 5:07 ` erik quanstrom 2015-05-10 17:57 ` David du Colombier 2015-05-10 20:18 ` erik quanstrom 2015-05-10 20:19 ` cinap_lenrek 2015-05-10 20:51 ` erik quanstrom 2015-05-10 21:34 ` cinap_lenrek 2015-05-11 1:23 ` erik quanstrom 2015-05-09 16:59 ` erik quanstrom 2015-05-06 22:35 ` Steven Stallion 2015-05-06 23:47 ` Charles Forsyth 2015-05-07 3:38 ` erik quanstrom 2015-05-07 3:43 ` erik quanstrom 2015-05-05 15:07 ` KADOTA Kyohei 2015-05-05 14:47 ` KADOTA Kyohei 2015-05-05 15:46 ` steve 2015-05-05 15:54 ` David du Colombier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).