Wg source address is too sticky for multihomed systems aka multiple endpoints redux

Development discussion of WireGuard
 help / color / mirror / Atom feed

* Wg source address is too sticky for multihomed systems aka multiple endpoints redux
@ 2023-07-21  0:06 Daniel Gröber
  2023-07-21  7:31 ` Nico Schottelius
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Gröber @ 2023-07-21  0:06 UTC (permalink / raw)
  To: wireguard; +Cc: Jason A. Donenfeld, Baptiste Jonglez, Nico Schottelius

Hi wire-guard, :)

tl;dr: I wan to implement mutliple peer endpoints to fix the only two
problems haunting me with wireguard.

I have a multihomed router with two public IPv4 addresses plus default
routes in a failover configuration. The setup includes the two default
routes with different metrics and appropriate ip-rule(s) to make traffic
with a preselected source address leave via the correct interface.

On top of this v4 underlay I run a number of wireguard interfaces providing
IPv6 service for my network. Since one of the v4 uplinks is an LTE/5G
router the main uplink is usually preferable and the (default) route
metrics reflect this.

However I've observed wireguard continuing to send traffic via the larger
metric default route after failover events even after the primary link and
it's default route is back.

Source address issues on multihomed hosts have been discussed on the
list multiple times before. See for example:
- https://lists.zx2c4.com/pipermail/wireguard/2023-February/007948.html
- https://lists.zx2c4.com/pipermail/wireguard/2021-October/007205.html
- https://lists.zx2c4.com/pipermail/wireguard/2021-November/007309.html

So I'm certainly not the only one experiencing issues like this.

I set out on a quest to debug this. My first reading of the code indicated
that perhaps the dst_cache is at fault but after adding some tracing code
it became clear that our endpoint logic is simply broken for multihomed
systems:

The dst_cache gets properly invalidated whenever route switchover happens
but when doing a new rt lookup we force the lookup to use the (known good)
src address.

This is deficient because if we run a full route lookup we might get a
different source address (as is the case in my setup). I do think I
understand why we do things like this: we know this source address is
working and the new one could break connectivity. Fair enough.

So here's a proposal: we introduce a second wg_peer endpoint address for
use with handshakes. This way we can send a handshake using the new source
address and only switch if it succeeds.

I do expect this to be a fair bit of additional logic since we need to deal
with timeouts, retrys and such. However I think this is a good opportunity
to kill two birds with one stone. Hear me out.

I have a second issue with wireguard that's been bugging me for ages:
IPv4/6 non-dual-stack support _sucks_. The kernel only knows about one
endpoint address ever so if a endpoint (DNS) host resolves to multiple
addresses there's nothing userspace can easily do to make things work on
IPv4-only *and* IPv6-only networks.

This is kind of the same problem we're having with multihoming though: if
only wireguard could keep track of multiple endpoints (think: dst+src
address pairs).

So my proposal is to just add support for multiple endpoints. There is only
ever one endpoint involved in sending user data but we attempt handshakes
over all endpoints. (Exact logic TBD)

To fix the multihoming issue we then check if the socket.c:sendX rt lookup
returns a different src address form what we're expecting. If not we clone
the current (dst) endpoint with the new source address and kick off a
handshake over it.

Note "multiple endpoints" was suggested before in "[RFC] Handling multiple
endpoints for a single peer" and I agree with most of the design spec
presented in it:
- https://lists.zx2c4.com/pipermail/wireguard/2017-January/000917.html

I would perhaps not go as far as to introduce fancy RTT measurment. Me
personally, I have a proper routing daemon (babeld) in userspace using an
RTT metric for that. No need to do this in kernel.

The ability to send "out-of-band" packets to a particular peer mentioned by
Jason in the above mail would actually help routing daemons to cover the
entire failover story as that's the only limitation currently: I need one
wg tunnel per-peer to do routing but I digress.

Let me know what y'all think, I'd like to start hacking/designing this
ASAP. These things have been the only pain point in an otherwise stellar
user experience with wireguard!

Thanks,
--Daniel

PS: I have found one viable workaround for this source stickyness. `wg set
$iface fwmar $mark` will reset all peer src addresses, but it doesn't stick
at hight packet rates because (I think) the incoming packets immediately
overwrite the src address in wg_packet_consume_data_done() via
wg_socket_set_peer_endpoint(). So you have to do it a couple of times
(perhaps in a tight loop) for it to un-stick the source address :)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Wg source address is too sticky for multihomed systems aka multiple endpoints redux
  2023-07-21  0:06 Wg source address is too sticky for multihomed systems aka multiple endpoints redux Daniel Gröber
@ 2023-07-21  7:31 ` Nico Schottelius
  2023-07-21 13:47   ` John Lauro
  0 siblings, 1 reply; 4+ messages in thread
From: Nico Schottelius @ 2023-07-21  7:31 UTC (permalink / raw)
  To: Daniel Gröber
  Cc: wireguard, Jason A. Donenfeld, Baptiste Jonglez, Nico Schottelius

Good morning,

Daniel Gröber <dxld@darkboxed.org> writes:
> [...]
> I have a multihomed router [...]

following up the thread from February, we migrated away from wireguard
to openvpn on systems that have are multi homed.

The main reason for that is the following type of connection to a high
probability fails to work:

1) device -> [NAT/FIREWALL] -> multi homed server [IP A]
2) multi homed server [IP B] -- blocked by firewall as it does not match
table entry

This always happens when the server has as an asymmetric route back to
the originating device, which really depends on the routing tables
or routing policy present on the multi homed server.

I'm a big fan of simplicity, but without an equivalent of openvpn's
"local" statement, wireguard is deemed to be unusable in many network
scenarios.

Best regards,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Wg source address is too sticky for multihomed systems aka multiple endpoints redux
  2023-07-21  7:31 ` Nico Schottelius
@ 2023-07-21 13:47   ` John Lauro
  2023-07-23 17:05     ` Daniel Gröber
  0 siblings, 1 reply; 4+ messages in thread
From: John Lauro @ 2023-07-21 13:47 UTC (permalink / raw)
  To: Nico Schottelius
  Cc: Daniel Gröber, wireguard, Jason A. Donenfeld, Baptiste Jonglez

I have a lots of multihomed routers setup for vpn site to site and
running bgp over the vpn mesh.

First, make sure these are all 0 as are multihomed.
cat $( find /proc/sys/net/ipv4 -name rp_filter )

The other thing I do is I run a different wireguard interface and peer
on a different port and interface.

With bgp on top, one multihomed router to another multihomed router
just ends up being multiple links it can route over and let linux/bgp
decide which ones to use and automatically fail over if one path goes
down.

That said, I don't have any NAT and both ends have fixed IPs, although
they are multihomed.

Can you create a separate wireguard interface for each physical
interface (I suggest a different port too).  Separate wireguard
interfaces should keep WG from having issues, and of course disabling
rp_filter to keep linux from having issues.

On Fri, Jul 21, 2023 at 4:05 AM Nico Schottelius
<nico.schottelius@ungleich.ch> wrote:
>
>
> Good morning,
>
> Daniel Gröber <dxld@darkboxed.org> writes:
> > [...]
> > I have a multihomed router [...]
>
> following up the thread from February, we migrated away from wireguard
> to openvpn on systems that have are multi homed.
>
> The main reason for that is the following type of connection to a high
> probability fails to work:
>
> 1) device -> [NAT/FIREWALL] -> multi homed server [IP A]
> 2) multi homed server [IP B] -- blocked by firewall as it does not match
> table entry
>
> This always happens when the server has as an asymmetric route back to
> the originating device, which really depends on the routing tables
> or routing policy present on the multi homed server.
>
> I'm a big fan of simplicity, but without an equivalent of openvpn's
> "local" statement, wireguard is deemed to be unusable in many network
> scenarios.
>
> Best regards,
>
> Nico
>
>
> --
> Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Wg source address is too sticky for multihomed systems aka multiple endpoints redux
  2023-07-21 13:47   ` John Lauro
@ 2023-07-23 17:05     ` Daniel Gröber
  0 siblings, 0 replies; 4+ messages in thread
From: Daniel Gröber @ 2023-07-23 17:05 UTC (permalink / raw)
  To: John Lauro
  Cc: Nico Schottelius, wireguard, Jason A. Donenfeld, Baptiste Jonglez

Hi John,

On Fri, Jul 21, 2023 at 09:47:11AM -0400, John Lauro wrote:
> I have a lots of multihomed routers setup for vpn site to site and
> running bgp over the vpn mesh.
> 
> First, make sure these are all 0 as are multihomed.
> cat $( find /proc/sys/net/ipv4 -name rp_filter )

My routers are behind consumer ISPs so I never get packets which would fail
RPF and I have RPF upstream of me either way, so this doesn't make a
difference in my case. Like I said I have ip-rules (PBR) to direct traffic
to the correct interface based on source address to appease upstream's RPF.

> The other thing I do is I run a different wireguard interface and peer
> on a different port and interface.

Same, in order to run a routing daemon on top of wg you pretty much have to
do that currently as only one peer may have AllowedIPs=::/0 but the routing
daemons dont (yet, I'm working on this for babel) know how to update
AllowedIPs.

> With bgp on top, one multihomed router to another multihomed router
> just ends up being multiple links it can route over and let linux/bgp
> decide which ones to use and automatically fail over if one path goes
> down.
> 
> That said, I don't have any NAT and both ends have fixed IPs, although
> they are multihomed.

I'm pretty sure you're not seeing the problem I describe here because your
paths are going to be pretty equivalent, but in my case one is DOCSIS3 and
one is LTE/5G (depends on weather) which is much worse in terms of
bandwidth and latency/jitter consistency. So I can actually see the
difference in applications (video buffering etc) which is what had me start
debugging in the first place :)

> Can you create a separate wireguard interface for each physical
> interface (I suggest a different port too).  Separate wireguard
> interfaces should keep WG from having issues, and of course disabling
> rp_filter to keep linux from having issues.

Hmm, that might just work since my routing daemon does RTT based routing
and the mobile connection is going to be much worse there. I already have
to deploy two tunnel because of the mentioned v4/v6 dualstack issue so I'm
not really keen to multiply that number _again_. Besides my `set fwmark`
workaround does actually legitimately work but it's ugly as hell :)

> On Fri, Jul 21, 2023 at 4:05 AM Nico Schottelius

/me realizes you were replying to Nico *blush*. See this is why you don't
top-post. Learn some netiquette people :-)

I've actually taken my followup discussion with Nico off-list because I
think it might be a more involved debug session on what's going on in his
setup, which is going to distract from my proposal. I'll send any
conclusions we come to back to the list though.

FYI: I do have a patch to add the necessary debugging code and logs to show
the concrete issue here, I just didn't want to cause information overload
in the initial mail. Just let me know and I'll send those along if there's
any doubt about whether what I describe is the actual issue I'm having. I'm
pretty convinced but the first rule of the internet it that the problem is
always the X-Y problem~.

Thanks,
--Daniel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-07-23 17:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-21  0:06 Wg source address is too sticky for multihomed systems aka multiple endpoints redux Daniel Gröber
2023-07-21  7:31 ` Nico Schottelius
2023-07-21 13:47   ` John Lauro
2023-07-23 17:05     ` Daniel Gröber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).