* Performance of Wireguard on Infiniband 40G @ 2017-05-13 7:37 Baptiste Jonglez 2017-05-13 22:52 ` Jason A. Donenfeld 2017-05-13 23:45 ` Jason A. Donenfeld 0 siblings, 2 replies; 6+ messages in thread From: Baptiste Jonglez @ 2017-05-13 7:37 UTC (permalink / raw) To: wireguard [-- Attachment #1: Type: text/plain, Size: 919 bytes --] Hi, Just for information, I did a quick test of Wireguard over a 40G Infiniband network, between two machines with a Xeon E5520. Using iperf (TCP mode) over the wireguard interface, performance was around 1.6 Gbit/s. In bidirectional mode (iperf -d), performance was 700 Mbit/s + 800 Mbit/s. Note that infiniband has a MTU of 65520 bytes, but Wireguard still selects a MTU of 1420 bytes for its interface. After raising the MTU of the wireguard interface to 65450, performance went up to 7.6 Gbit/s (unidirectional iperf). Using the Infiniband network directly, iperf's performance is 21.7 Gbit/s (iperf maxes out the CPU at the receiver, even when using 8 threads). Hardware used: - Xeon E5520 @2.27GHz (2 CPUs, 4 cores each) - Mellanox ConnectX IB 4X QDR MT26428 Versions used: - Debian jessie - Linux 3.16.43-2 - Wireguard 0.0.20170421-2 - iperf 2.0.5 - Mellanox ConnectX InfiniBand driver v2.2-1 Baptiste [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Performance of Wireguard on Infiniband 40G 2017-05-13 7:37 Performance of Wireguard on Infiniband 40G Baptiste Jonglez @ 2017-05-13 22:52 ` Jason A. Donenfeld 2017-05-14 9:55 ` Baptiste Jonglez 2017-05-13 23:45 ` Jason A. Donenfeld 1 sibling, 1 reply; 6+ messages in thread From: Jason A. Donenfeld @ 2017-05-13 22:52 UTC (permalink / raw) To: Baptiste Jonglez; +Cc: WireGuard mailing list Hey Baptiste, Awesome test! Thanks for reporting the results. On Sat, May 13, 2017 at 9:37 AM, Baptiste Jonglez <baptiste@bitsofnetworks.org> wrote: > Using iperf (TCP mode) over the wireguard interface, performance was > around 1.6 Gbit/s. In bidirectional mode (iperf -d), performance was > 700 Mbit/s + 800 Mbit/s. Indeed the current multicore algorithm has a lot of issues. Samuel, CCd, is going to be doing some work on optimizing this algorithm this summer. > After raising the MTU of the wireguard interface to 65450, performance > went up to 7.6 Gbit/s (unidirectional iperf). It makes sense that it'd be higher, since CPUs work best when running uninterrupted, but still this indicates that padata is a very suboptimal algorithm. Expect some improvements on this in the coming months. Hopefully you'll be able to test on similar hardware at some point when things are finished. > Note that infiniband has a MTU of 65520 bytes, but Wireguard still selects > a MTU of 1420 bytes for its interface. Yea the 1420 is just a hard coded "default". I probably add something clever to autoselect an MTU when configuring the first peer's first endpoint (by computing the route and taking its interface's mtu and doing subtraction, etc), but the long term solution, I think, will be to do some more clever PMTU situation from within WireGuard. I'm still working out exactly how to do this, but it should be possible. > - Xeon E5520 @2.27GHz (2 CPUs, 4 cores each) > - Mellanox ConnectX IB 4X QDR MT26428 *drools* That's some awesome hardware! > - iperf 2.0.5 iperf2 has the -b bidirectional mode which is nice, but it seems like most people are using iperf3 now. Out of curiosity, is there a reason for preferring iperf2, beyond the -b switch? Jason ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Performance of Wireguard on Infiniband 40G 2017-05-13 22:52 ` Jason A. Donenfeld @ 2017-05-14 9:55 ` Baptiste Jonglez 2017-05-14 10:48 ` Greg KH 0 siblings, 1 reply; 6+ messages in thread From: Baptiste Jonglez @ 2017-05-14 9:55 UTC (permalink / raw) To: Jason A. Donenfeld; +Cc: WireGuard mailing list [-- Attachment #1: Type: text/plain, Size: 3049 bytes --] On Sun, May 14, 2017 at 12:52:11AM +0200, Jason A. Donenfeld wrote: > One small and unfortunate thought just occurred to me: the backporting > to really old kernels I'm pretty sure is way less efficient than newer > kernels on the RX, due to some missing core fast-path APIs in the old > kernels. In particular, I had to wrap the UDP layer with some nasty > hacks to get packets out, whereas newer kernels have an elegant API > for that which integrates in the right place. Just a thought... I > haven't actually done concrete measurements though. Good idea, I have redone the same setup with kernel 4.9.18 from jessie-backports. TL;DR: when switching from kernel 3.16 to 4.9, wireguard has a 50% performance gain in the most favourable case (large MTU). Also, iperf seems generally faster than iperf3, most likely because iperf3 has no multi-threading. The full results, still over Infiniand 40G, are: - unidirectional iperf[1 thread] with 1420 MTU: 2.1 Gbit/s (instead of 1.6 Gbit/s with kernel 3.16) - bidirectional iperf[1 thread] with 1420 MTU: 780 Mbit/s + 1.0 Gbit/s (instead of 700 Mbit/s + 800 Mbit/s with kernel 3.16) - unidirectional iperf[8 threads] with 65450 MTU: 11.4 Gbit/s (instead of 7.6 Gbit/s with kernel 3.16) Without wireguard, as a baseline: - unidirectional iperf[8 threads] with 65450 MTU: 23.3 Gbit/s (instead of 21.7 Gbit/s with kernel 3.16) So, the new kernel definitely improved performance: by 7% for iperf, and by up to 50% for wireguard + iperf. > > - iperf 2.0.5 > > iperf2 has the -b bidirectional mode which is nice, but it seems like > most people are using iperf3 now. Out of curiosity, is there a reason > for preferring iperf2, beyond the -b switch? As I said, it was just a quick test (to see if it worked fine with Jessie's 3.16 kernel). Iperf was already installed but Iperf3 was not. It turns out that iperf3 is slower in this setup, most likely because iperf is multi-threaded but iperf3 is not. For the baseline test (without wireguard): - iperf[1 thread]: 13.7 Gbit/s - iperf[8 threads]: 23.4 Gbit/s - iperf3[1 stream]: 16.8 Gbit/s - iperf3[8 streams]: 13.6 Gbit/s This was with iperf 2.0.5 and iperf3 3.0.7 (jessie). Just to be sure, with more recent versions (iperf 2.0.9, iperf3 3.1.3): - iperf[1 thread]: 13.6 Gbit/s - iperf[8 threads]: 23.3 Gbit/s - iperf3[1 stream]: 16.8 Gbit/s - iperf3[8 streams]: 13.6 Gbit/s So, the behaviour is the same: iperf is faster than iperf3 thanks to multi-threading. I also tested through wireguard: - unidirectional iperf3[1 stream] with 65450 MTU: 6.47 Gbit/s (instead of 6.42 Gbit/s with iperf[1 thread]) - unidirectional iperf3[8 streams] with 65450 MTU: 10.9 Gbit/s (instead of 11.4 Gbit/s with iperf[8 threads]) > > - Xeon E5520 @2.27GHz (2 CPUs, 4 cores each) > > - Mellanox ConnectX IB 4X QDR MT26428 > > *drools* That's some awesome hardware! Well, it's not my hardware :) But it's not exactly new, it dates back from 2009. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Performance of Wireguard on Infiniband 40G 2017-05-14 9:55 ` Baptiste Jonglez @ 2017-05-14 10:48 ` Greg KH 2017-05-14 10:49 ` Jason A. Donenfeld 0 siblings, 1 reply; 6+ messages in thread From: Greg KH @ 2017-05-14 10:48 UTC (permalink / raw) To: Baptiste Jonglez; +Cc: WireGuard mailing list On Sun, May 14, 2017 at 11:55:52AM +0200, Baptiste Jonglez wrote: > On Sun, May 14, 2017 at 12:52:11AM +0200, Jason A. Donenfeld wrote: > > One small and unfortunate thought just occurred to me: the backporting > > to really old kernels I'm pretty sure is way less efficient than newer > > kernels on the RX, due to some missing core fast-path APIs in the old > > kernels. In particular, I had to wrap the UDP layer with some nasty > > hacks to get packets out, whereas newer kernels have an elegant API > > for that which integrates in the right place. Just a thought... I > > haven't actually done concrete measurements though. > > Good idea, I have redone the same setup with kernel 4.9.18 from > jessie-backports. > > TL;DR: when switching from kernel 3.16 to 4.9, wireguard has a 50% > performance gain in the most favourable case (large MTU). Also, iperf > seems generally faster than iperf3, most likely because iperf3 has no > multi-threading. 4.9 is 6 months old, I'd be curious if 4.11 is any faster given the rate of change in the network stack :) thanks, greg k-h ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Performance of Wireguard on Infiniband 40G 2017-05-14 10:48 ` Greg KH @ 2017-05-14 10:49 ` Jason A. Donenfeld 0 siblings, 0 replies; 6+ messages in thread From: Jason A. Donenfeld @ 2017-05-14 10:49 UTC (permalink / raw) To: Greg KH; +Cc: WireGuard mailing list Hey Greg, On Sun, May 14, 2017 at 12:48 PM, Greg KH <greg@kroah.com> wrote: > 4.9 is 6 months old, I'd be curious if 4.11 is any faster given the rate > of change in the network stack :) I imagine it might be. I think the biggest bottle neck, in any case, is still the poor algorithm in padata. Hopefully we'll get this sorted with the help of Samuel's research this summer! Jason ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Performance of Wireguard on Infiniband 40G 2017-05-13 7:37 Performance of Wireguard on Infiniband 40G Baptiste Jonglez 2017-05-13 22:52 ` Jason A. Donenfeld @ 2017-05-13 23:45 ` Jason A. Donenfeld 1 sibling, 0 replies; 6+ messages in thread From: Jason A. Donenfeld @ 2017-05-13 23:45 UTC (permalink / raw) To: Baptiste Jonglez; +Cc: WireGuard mailing list Hey again, On Sat, May 13, 2017 at 9:37 AM, Baptiste Jonglez <baptiste@bitsofnetworks.org> wrote: > - Debian jessie > - Linux 3.16.43-2 One small and unfortunate thought just occurred to me: the backporting to really old kernels I'm pretty sure is way less efficient than newer kernels on the RX, due to some missing core fast-path APIs in the old kernels. In particular, I had to wrap the UDP layer with some nasty hacks to get packets out, whereas newer kernels have an elegant API for that which integrates in the right place. Just a thought... I haven't actually done concrete measurements though. Jason ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-05-14 10:39 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-05-13 7:37 Performance of Wireguard on Infiniband 40G Baptiste Jonglez 2017-05-13 22:52 ` Jason A. Donenfeld 2017-05-14 9:55 ` Baptiste Jonglez 2017-05-14 10:48 ` Greg KH 2017-05-14 10:49 ` Jason A. Donenfeld 2017-05-13 23:45 ` Jason A. Donenfeld
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).