Hello,
I've been evaluating the use of Wireguard to replace a setup that uses OpenVPN. Initial tests look promising in terms of system resources required (much less CPU than OpenVPN), but I'm encountering a fair amount of packet loss and I can't see why.
The scenario is a public API endpoint that devices ping with a reasonably hefty payload. The payload is received by nginx which proxies it over a tunnel (via public network) to a server downstream.
Wiregard version is 0.0.20190406-1,
The test server is an Intel i5-4460 running Debian, with 4.19.0-5-amd64 kernel.
load average: 2.18, 2.12, 2.12
%Cpu(s): 25.2 us, 3.0 sy, 0.0 ni, 68.3 id, 0.0 wa, 0.0 hi, 3.5 si, 0.0 st
So basically, the traffic isn't exceptionally heavy and it is pretty stable in terms of volume, and the machine is not doing anything else.
Looking at the wg0 interface, I see it dropping a fair amount of RX packets. Doing some maths with /sys/class/net/wg0/statistics, it shows the interface is receiving about 600KB/sec and around 5000pps. The RX dropped counter is rising at about 120-150pps (between 2-3%) and this is show up as an error to the sender which then has to explicitly retry (this is how I became aware of the problem in the first place).
The underlying eth0 interface isn't seeing a single packet dropped or any errors.
eth0 mtu is 1500, wg0 mtu is 1420 (haven't touched these).
I've tried raising txqueuelen, raising net.core.rmem_max and net.core.rmem_default to stupidly high values with 0 difference.
I've tried setting net.ipv4.tcp_rmem='16384 33554432 67108864, increasing net.core.netdev_max_backlog and net.ipv4.udp_mem but nothing changes. So rather than try even more random changes, I'm wondering if anybody recognizes the symptoms, and what the fix is? I think that covers it, but feel free to ask for other metrics.
The exact same machine using OpenVPN dropped nothing (although user cpu was closer to 60%).
Thanks,
Ian.