* Poor performance under high load
@ 2018-06-26 7:49 Maximilian Pudelko
2018-06-26 17:57 ` Jason A. Donenfeld
0 siblings, 1 reply; 5+ messages in thread
From: Maximilian Pudelko @ 2018-06-26 7:49 UTC (permalink / raw)
To: wireguard
Hello WireGuard list,
as part of my research I (try to) evaluate the performance of
WireGuard and found a curious pattern under increasing load: Until
~0.55 Mpps WireGuard can keep up with encrypting and forwarding, but
after that the rate decreases while the CPU load keeps increasing
until all cores are 100% utilized and _no_ packets get send.
Is that expected behavior due to the unoptimized implementation that
uses a ring buffer with spinlocks? perf shows that a lot of time is
spend waiting for those locks.
Is there a simple way to get better performance? Multiple connections?
Measurement graph:
https://gist.github.com/pudelkoM/2f216e7eb820fc5dc898eaea119448e5
Test configuration for reference:
- WireGuard 0.0.20180531
- Point-to-point setup with 2 hosts
- Linux 4.4.0-78-generic
- Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
- 2x Intel XL710 NICs
- Traffic: 64 byte UDP packets, 1000 parallel flows by src port randomization
Regards
Max
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Poor performance under high load
2018-06-26 7:49 Poor performance under high load Maximilian Pudelko
@ 2018-06-26 17:57 ` Jason A. Donenfeld
2018-07-02 12:22 ` Maximilian Pudelko
0 siblings, 1 reply; 5+ messages in thread
From: Jason A. Donenfeld @ 2018-06-26 17:57 UTC (permalink / raw)
To: maximilian.pudelko; +Cc: WireGuard mailing list
Hi Max,
Thanks for doing this test; that's super useful. What you're
describing is definitely not expected behavior. Think you could try
the same test with 0.0.20180620 and 0.0.20180625? In particular, I'm
interested to know whether a performance _regression_ introduced in
0.0.20180620 actually results in the correct behavior.
Meanwhile, we (CC'd) have been working on implementing a lockfree
queue structure, but we haven't seen any configurations yet where this
actually results in a performance improvement.
Care to share your benchmark scripts? Sounds like this could be really
useful for directing our optimizations.
Jason
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Poor performance under high load
2018-06-26 17:57 ` Jason A. Donenfeld
@ 2018-07-02 12:22 ` Maximilian Pudelko
2018-07-09 19:23 ` Jason A. Donenfeld
0 siblings, 1 reply; 5+ messages in thread
From: Maximilian Pudelko @ 2018-07-02 12:22 UTC (permalink / raw)
To: Jason A. Donenfeld; +Cc: WireGuard mailing list
Hi Jason,
>try the same test with 0.0.20180620 and 0.0.20180625
The Ubuntu ppa only contains version 0.0.20180625 as far as I can see
(apt-cache madison wireguard), so I only measured this version.
It's a bit (+0.1 Mpps) faster across the board and does drop to zero
later (~2.5 Mpps load). See the graph for details:
https://github.com/pudelkoM/MoonWire/blob/master/benchmarks/wireguard/results/0.0.20180625/encrypt-64.pdf
>Care to share your benchmark scripts?
No problem, but I doubt that these are integrate-able into a build
pipeline because they depend on libmoon (Lua wrapper for DPDK),
require at least 10 Gbit NICs and some manual data collection.
https://github.com/pudelkoM/MoonWire/tree/master/benchmarks
FYI: I'm also working on a WireGuard prototype based on DPDK to see
the performance impact of different network stacks. A very early
version that just receives, encrypts and forwards packets reaches
around 1.4 Mpps _on a single core_, so pretty promising if that can be
scaled up. But that's very far away from done (no handshakes,
hardcoded keys, single session, ...). See the same repository for
source.
Max
2018-06-26 17:57 GMT+00:00 Jason A. Donenfeld <Jason@zx2c4.com>:
> Hi Max,
>
> Thanks for doing this test; that's super useful. What you're
> describing is definitely not expected behavior. Think you could try
> the same test with 0.0.20180620 and 0.0.20180625? In particular, I'm
> interested to know whether a performance _regression_ introduced in
> 0.0.20180620 actually results in the correct behavior.
>
> Meanwhile, we (CC'd) have been working on implementing a lockfree
> queue structure, but we haven't seen any configurations yet where this
> actually results in a performance improvement.
>
> Care to share your benchmark scripts? Sounds like this could be really
> useful for directing our optimizations.
>
> Jason
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-07-09 20:46 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-26 7:49 Poor performance under high load Maximilian Pudelko
2018-06-26 17:57 ` Jason A. Donenfeld
2018-07-02 12:22 ` Maximilian Pudelko
2018-07-09 19:23 ` Jason A. Donenfeld
2018-07-09 20:53 ` logcabin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).