From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.zx2c4.com (lists.zx2c4.com [165.227.139.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DB902EB64DA for ; Fri, 21 Jul 2023 00:06:56 +0000 (UTC) Received: by lists.zx2c4.com (ZX2C4 Mail Server) with ESMTP id 53ce32c0; Fri, 21 Jul 2023 00:06:55 +0000 (UTC) Received: from janet.servers.dxld.at (mail.servers.dxld.at [5.9.225.164]) by lists.zx2c4.com (ZX2C4 Mail Server) with ESMTPS id 8deaddd4 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Fri, 21 Jul 2023 00:06:54 +0000 (UTC) Received: janet.servers.dxld.at; Fri, 21 Jul 2023 02:06:53 +0200 Date: Fri, 21 Jul 2023 02:06:43 +0200 From: Daniel =?utf-8?Q?Gr=C3=B6ber?= To: wireguard@lists.zx2c4.com Cc: "Jason A. Donenfeld" , Baptiste Jonglez , Nico Schottelius Subject: Wg source address is too sticky for multihomed systems aka multiple endpoints redux Message-ID: <20230721000643.44y5pd7sfcjzhbjw@House.clients.dxld.at> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-BeenThere: wireguard@lists.zx2c4.com X-Mailman-Version: 2.1.30rc1 Precedence: list List-Id: Development discussion of WireGuard List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: wireguard-bounces@lists.zx2c4.com Sender: "WireGuard" Hi wire-guard, :) tl;dr: I wan to implement mutliple peer endpoints to fix the only two problems haunting me with wireguard. I have a multihomed router with two public IPv4 addresses plus default routes in a failover configuration. The setup includes the two default routes with different metrics and appropriate ip-rule(s) to make traffic with a preselected source address leave via the correct interface. On top of this v4 underlay I run a number of wireguard interfaces providing IPv6 service for my network. Since one of the v4 uplinks is an LTE/5G router the main uplink is usually preferable and the (default) route metrics reflect this. However I've observed wireguard continuing to send traffic via the larger metric default route after failover events even after the primary link and it's default route is back. Source address issues on multihomed hosts have been discussed on the list multiple times before. See for example: - https://lists.zx2c4.com/pipermail/wireguard/2023-February/007948.html - https://lists.zx2c4.com/pipermail/wireguard/2021-October/007205.html - https://lists.zx2c4.com/pipermail/wireguard/2021-November/007309.html So I'm certainly not the only one experiencing issues like this. I set out on a quest to debug this. My first reading of the code indicated that perhaps the dst_cache is at fault but after adding some tracing code it became clear that our endpoint logic is simply broken for multihomed systems: The dst_cache gets properly invalidated whenever route switchover happens but when doing a new rt lookup we force the lookup to use the (known good) src address. This is deficient because if we run a full route lookup we might get a different source address (as is the case in my setup). I do think I understand why we do things like this: we know this source address is working and the new one could break connectivity. Fair enough. So here's a proposal: we introduce a second wg_peer endpoint address for use with handshakes. This way we can send a handshake using the new source address and only switch if it succeeds. I do expect this to be a fair bit of additional logic since we need to deal with timeouts, retrys and such. However I think this is a good opportunity to kill two birds with one stone. Hear me out. I have a second issue with wireguard that's been bugging me for ages: IPv4/6 non-dual-stack support _sucks_. The kernel only knows about one endpoint address ever so if a endpoint (DNS) host resolves to multiple addresses there's nothing userspace can easily do to make things work on IPv4-only *and* IPv6-only networks. This is kind of the same problem we're having with multihoming though: if only wireguard could keep track of multiple endpoints (think: dst+src address pairs). So my proposal is to just add support for multiple endpoints. There is only ever one endpoint involved in sending user data but we attempt handshakes over all endpoints. (Exact logic TBD) To fix the multihoming issue we then check if the socket.c:sendX rt lookup returns a different src address form what we're expecting. If not we clone the current (dst) endpoint with the new source address and kick off a handshake over it. Note "multiple endpoints" was suggested before in "[RFC] Handling multiple endpoints for a single peer" and I agree with most of the design spec presented in it: - https://lists.zx2c4.com/pipermail/wireguard/2017-January/000917.html I would perhaps not go as far as to introduce fancy RTT measurment. Me personally, I have a proper routing daemon (babeld) in userspace using an RTT metric for that. No need to do this in kernel. The ability to send "out-of-band" packets to a particular peer mentioned by Jason in the above mail would actually help routing daemons to cover the entire failover story as that's the only limitation currently: I need one wg tunnel per-peer to do routing but I digress. Let me know what y'all think, I'd like to start hacking/designing this ASAP. These things have been the only pain point in an otherwise stellar user experience with wireguard! Thanks, --Daniel PS: I have found one viable workaround for this source stickyness. `wg set $iface fwmar $mark` will reset all peer src addresses, but it doesn't stick at hight packet rates because (I think) the incoming packets immediately overwrite the src address in wg_packet_consume_data_done() via wg_socket_set_peer_endpoint(). So you have to do it a couple of times (perhaps in a tight loop) for it to un-stick the source address :)