On Sun, May 14, 2017 at 12:52:11AM +0200, Jason A. Donenfeld wrote: > One small and unfortunate thought just occurred to me: the backporting > to really old kernels I'm pretty sure is way less efficient than newer > kernels on the RX, due to some missing core fast-path APIs in the old > kernels. In particular, I had to wrap the UDP layer with some nasty > hacks to get packets out, whereas newer kernels have an elegant API > for that which integrates in the right place. Just a thought... I > haven't actually done concrete measurements though. Good idea, I have redone the same setup with kernel 4.9.18 from jessie-backports. TL;DR: when switching from kernel 3.16 to 4.9, wireguard has a 50% performance gain in the most favourable case (large MTU). Also, iperf seems generally faster than iperf3, most likely because iperf3 has no multi-threading. The full results, still over Infiniand 40G, are: - unidirectional iperf[1 thread] with 1420 MTU: 2.1 Gbit/s (instead of 1.6 Gbit/s with kernel 3.16) - bidirectional iperf[1 thread] with 1420 MTU: 780 Mbit/s + 1.0 Gbit/s (instead of 700 Mbit/s + 800 Mbit/s with kernel 3.16) - unidirectional iperf[8 threads] with 65450 MTU: 11.4 Gbit/s (instead of 7.6 Gbit/s with kernel 3.16) Without wireguard, as a baseline: - unidirectional iperf[8 threads] with 65450 MTU: 23.3 Gbit/s (instead of 21.7 Gbit/s with kernel 3.16) So, the new kernel definitely improved performance: by 7% for iperf, and by up to 50% for wireguard + iperf. > > - iperf 2.0.5 > > iperf2 has the -b bidirectional mode which is nice, but it seems like > most people are using iperf3 now. Out of curiosity, is there a reason > for preferring iperf2, beyond the -b switch? As I said, it was just a quick test (to see if it worked fine with Jessie's 3.16 kernel). Iperf was already installed but Iperf3 was not. It turns out that iperf3 is slower in this setup, most likely because iperf is multi-threaded but iperf3 is not. For the baseline test (without wireguard): - iperf[1 thread]: 13.7 Gbit/s - iperf[8 threads]: 23.4 Gbit/s - iperf3[1 stream]: 16.8 Gbit/s - iperf3[8 streams]: 13.6 Gbit/s This was with iperf 2.0.5 and iperf3 3.0.7 (jessie). Just to be sure, with more recent versions (iperf 2.0.9, iperf3 3.1.3): - iperf[1 thread]: 13.6 Gbit/s - iperf[8 threads]: 23.3 Gbit/s - iperf3[1 stream]: 16.8 Gbit/s - iperf3[8 streams]: 13.6 Gbit/s So, the behaviour is the same: iperf is faster than iperf3 thanks to multi-threading. I also tested through wireguard: - unidirectional iperf3[1 stream] with 65450 MTU: 6.47 Gbit/s (instead of 6.42 Gbit/s with iperf[1 thread]) - unidirectional iperf3[8 streams] with 65450 MTU: 10.9 Gbit/s (instead of 11.4 Gbit/s with iperf[8 threads]) > > - Xeon E5520 @2.27GHz (2 CPUs, 4 cores each) > > - Mellanox ConnectX IB 4X QDR MT26428 > > *drools* That's some awesome hardware! Well, it's not my hardware :) But it's not exactly new, it dates back from 2009.