Development discussion of WireGuard
 help / color / mirror / Atom feed
* Performance of Wireguard on Infiniband 40G
@ 2017-05-13  7:37 Baptiste Jonglez
  2017-05-13 22:52 ` Jason A. Donenfeld
  2017-05-13 23:45 ` Jason A. Donenfeld
  0 siblings, 2 replies; 6+ messages in thread
From: Baptiste Jonglez @ 2017-05-13  7:37 UTC (permalink / raw)
  To: wireguard

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

Hi,

Just for information, I did a quick test of Wireguard over a 40G
Infiniband network, between two machines with a Xeon E5520.

Using iperf (TCP mode) over the wireguard interface, performance was
around 1.6 Gbit/s.  In bidirectional mode (iperf -d), performance was
700 Mbit/s + 800 Mbit/s.

Note that infiniband has a MTU of 65520 bytes, but Wireguard still selects
a MTU of 1420 bytes for its interface.
After raising the MTU of the wireguard interface to 65450, performance
went up to 7.6 Gbit/s (unidirectional iperf).

Using the Infiniband network directly, iperf's performance is 21.7 Gbit/s
(iperf maxes out the CPU at the receiver, even when using 8 threads).

Hardware used:

- Xeon E5520 @2.27GHz (2 CPUs, 4 cores each)
- Mellanox ConnectX IB 4X QDR MT26428

Versions used:

- Debian jessie
- Linux 3.16.43-2
- Wireguard 0.0.20170421-2
- iperf 2.0.5
- Mellanox ConnectX InfiniBand driver v2.2-1

Baptiste

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Performance of Wireguard on Infiniband 40G
  2017-05-13  7:37 Performance of Wireguard on Infiniband 40G Baptiste Jonglez
@ 2017-05-13 22:52 ` Jason A. Donenfeld
  2017-05-14  9:55   ` Baptiste Jonglez
  2017-05-13 23:45 ` Jason A. Donenfeld
  1 sibling, 1 reply; 6+ messages in thread
From: Jason A. Donenfeld @ 2017-05-13 22:52 UTC (permalink / raw)
  To: Baptiste Jonglez; +Cc: WireGuard mailing list

Hey Baptiste,

Awesome test! Thanks for reporting the results.

On Sat, May 13, 2017 at 9:37 AM, Baptiste Jonglez
<baptiste@bitsofnetworks.org> wrote:
> Using iperf (TCP mode) over the wireguard interface, performance was
> around 1.6 Gbit/s.  In bidirectional mode (iperf -d), performance was
> 700 Mbit/s + 800 Mbit/s.

Indeed the current multicore algorithm has a lot of issues. Samuel,
CCd, is going to be doing some work on optimizing this algorithm this
summer.

> After raising the MTU of the wireguard interface to 65450, performance
> went up to 7.6 Gbit/s (unidirectional iperf).

It makes sense that it'd be higher, since CPUs work best when running
uninterrupted, but still this indicates that padata is a very
suboptimal algorithm. Expect some improvements on this in the coming
months. Hopefully you'll be able to test on similar hardware at some
point when things are finished.

> Note that infiniband has a MTU of 65520 bytes, but Wireguard still selects
> a MTU of 1420 bytes for its interface.

Yea the 1420 is just a hard coded "default". I probably add something
clever to autoselect an MTU when configuring the first peer's first
endpoint (by computing the route and taking its interface's mtu and
doing subtraction, etc), but the long term solution, I think, will be
to do some more clever PMTU situation from within WireGuard. I'm still
working out exactly how to do this, but it should be possible.


> - Xeon E5520 @2.27GHz (2 CPUs, 4 cores each)
> - Mellanox ConnectX IB 4X QDR MT26428

*drools* That's some awesome hardware!

> - iperf 2.0.5

iperf2 has the -b bidirectional mode which is nice, but it seems like
most people are using iperf3 now. Out of curiosity, is there a reason
for preferring iperf2, beyond the -b switch?

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Performance of Wireguard on Infiniband 40G
  2017-05-13  7:37 Performance of Wireguard on Infiniband 40G Baptiste Jonglez
  2017-05-13 22:52 ` Jason A. Donenfeld
@ 2017-05-13 23:45 ` Jason A. Donenfeld
  1 sibling, 0 replies; 6+ messages in thread
From: Jason A. Donenfeld @ 2017-05-13 23:45 UTC (permalink / raw)
  To: Baptiste Jonglez; +Cc: WireGuard mailing list

Hey again,

On Sat, May 13, 2017 at 9:37 AM, Baptiste Jonglez
<baptiste@bitsofnetworks.org> wrote:
> - Debian jessie
> - Linux 3.16.43-2

One small and unfortunate thought just occurred to me: the backporting
to really old kernels I'm pretty sure is way less efficient than newer
kernels on the RX, due to some missing core fast-path APIs in the old
kernels. In particular, I had to wrap the UDP layer with some nasty
hacks to get packets out, whereas newer kernels have an elegant API
for that which integrates in the right place. Just a thought... I
haven't actually done concrete measurements though.

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Performance of Wireguard on Infiniband 40G
  2017-05-13 22:52 ` Jason A. Donenfeld
@ 2017-05-14  9:55   ` Baptiste Jonglez
  2017-05-14 10:48     ` Greg KH
  0 siblings, 1 reply; 6+ messages in thread
From: Baptiste Jonglez @ 2017-05-14  9:55 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

[-- Attachment #1: Type: text/plain, Size: 3049 bytes --]

On Sun, May 14, 2017 at 12:52:11AM +0200, Jason A. Donenfeld wrote:
> One small and unfortunate thought just occurred to me: the backporting
> to really old kernels I'm pretty sure is way less efficient than newer
> kernels on the RX, due to some missing core fast-path APIs in the old
> kernels. In particular, I had to wrap the UDP layer with some nasty
> hacks to get packets out, whereas newer kernels have an elegant API
> for that which integrates in the right place. Just a thought... I
> haven't actually done concrete measurements though.

Good idea, I have redone the same setup with kernel 4.9.18 from
jessie-backports.

TL;DR: when switching from kernel 3.16 to 4.9, wireguard has a 50%
performance gain in the most favourable case (large MTU).  Also, iperf
seems generally faster than iperf3, most likely because iperf3 has no
multi-threading.


The full results, still over Infiniand 40G, are:

- unidirectional iperf[1 thread] with 1420 MTU: 2.1 Gbit/s
  (instead of 1.6 Gbit/s with kernel 3.16)

- bidirectional iperf[1 thread] with 1420 MTU: 780 Mbit/s + 1.0 Gbit/s
  (instead of 700 Mbit/s + 800 Mbit/s with kernel 3.16)

- unidirectional iperf[8 threads] with 65450 MTU: 11.4 Gbit/s
  (instead of 7.6 Gbit/s with kernel 3.16)

Without wireguard, as a baseline:

- unidirectional iperf[8 threads] with 65450 MTU: 23.3 Gbit/s
  (instead of 21.7 Gbit/s with kernel 3.16)

So, the new kernel definitely improved performance: by 7% for iperf, and
by up to 50% for wireguard + iperf.

> > - iperf 2.0.5
> 
> iperf2 has the -b bidirectional mode which is nice, but it seems like
> most people are using iperf3 now. Out of curiosity, is there a reason
> for preferring iperf2, beyond the -b switch?

As I said, it was just a quick test (to see if it worked fine with
Jessie's 3.16 kernel).  Iperf was already installed but Iperf3 was not.

It turns out that iperf3 is slower in this setup, most likely because
iperf is multi-threaded but iperf3 is not.  For the baseline test (without
wireguard):

- iperf[1 thread]:   13.7 Gbit/s
- iperf[8 threads]:  23.4 Gbit/s
- iperf3[1 stream]:  16.8 Gbit/s 
- iperf3[8 streams]: 13.6 Gbit/s

This was with iperf 2.0.5 and iperf3 3.0.7 (jessie).

Just to be sure, with more recent versions (iperf 2.0.9, iperf3 3.1.3):

- iperf[1 thread]:   13.6 Gbit/s
- iperf[8 threads]:  23.3 Gbit/s
- iperf3[1 stream]:  16.8 Gbit/s 
- iperf3[8 streams]: 13.6 Gbit/s

So, the behaviour is the same: iperf is faster than iperf3 thanks to
multi-threading.

I also tested through wireguard:

- unidirectional iperf3[1 stream] with 65450 MTU: 6.47 Gbit/s
  (instead of 6.42 Gbit/s with iperf[1 thread])

- unidirectional iperf3[8 streams] with 65450 MTU: 10.9 Gbit/s
  (instead of 11.4 Gbit/s with iperf[8 threads])

> > - Xeon E5520 @2.27GHz (2 CPUs, 4 cores each)
> > - Mellanox ConnectX IB 4X QDR MT26428
> 
> *drools* That's some awesome hardware!

Well, it's not my hardware :)  But it's not exactly new, it dates back
from 2009.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Performance of Wireguard on Infiniband 40G
  2017-05-14  9:55   ` Baptiste Jonglez
@ 2017-05-14 10:48     ` Greg KH
  2017-05-14 10:49       ` Jason A. Donenfeld
  0 siblings, 1 reply; 6+ messages in thread
From: Greg KH @ 2017-05-14 10:48 UTC (permalink / raw)
  To: Baptiste Jonglez; +Cc: WireGuard mailing list

On Sun, May 14, 2017 at 11:55:52AM +0200, Baptiste Jonglez wrote:
> On Sun, May 14, 2017 at 12:52:11AM +0200, Jason A. Donenfeld wrote:
> > One small and unfortunate thought just occurred to me: the backporting
> > to really old kernels I'm pretty sure is way less efficient than newer
> > kernels on the RX, due to some missing core fast-path APIs in the old
> > kernels. In particular, I had to wrap the UDP layer with some nasty
> > hacks to get packets out, whereas newer kernels have an elegant API
> > for that which integrates in the right place. Just a thought... I
> > haven't actually done concrete measurements though.
> 
> Good idea, I have redone the same setup with kernel 4.9.18 from
> jessie-backports.
> 
> TL;DR: when switching from kernel 3.16 to 4.9, wireguard has a 50%
> performance gain in the most favourable case (large MTU).  Also, iperf
> seems generally faster than iperf3, most likely because iperf3 has no
> multi-threading.

4.9 is 6 months old, I'd be curious if 4.11 is any faster given the rate
of change in the network stack :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Performance of Wireguard on Infiniband 40G
  2017-05-14 10:48     ` Greg KH
@ 2017-05-14 10:49       ` Jason A. Donenfeld
  0 siblings, 0 replies; 6+ messages in thread
From: Jason A. Donenfeld @ 2017-05-14 10:49 UTC (permalink / raw)
  To: Greg KH; +Cc: WireGuard mailing list

Hey Greg,

On Sun, May 14, 2017 at 12:48 PM, Greg KH <greg@kroah.com> wrote:
> 4.9 is 6 months old, I'd be curious if 4.11 is any faster given the rate
> of change in the network stack :)

I imagine it might be. I think the biggest bottle neck, in any case,
is still the poor algorithm in padata. Hopefully we'll get this sorted
with the help of Samuel's research this summer!

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-05-14 10:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-13  7:37 Performance of Wireguard on Infiniband 40G Baptiste Jonglez
2017-05-13 22:52 ` Jason A. Donenfeld
2017-05-14  9:55   ` Baptiste Jonglez
2017-05-14 10:48     ` Greg KH
2017-05-14 10:49       ` Jason A. Donenfeld
2017-05-13 23:45 ` Jason A. Donenfeld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).