* Source IP incorrect on multi homed systems @ 2023-02-18 20:14 Nico Schottelius [not found] ` <CAHx9msc1cNV80YU7HRmQ9gsjSEiVZ=pb31aYqfP62hy8DeuGZA@mail.gmail.com> 2023-02-19 0:45 ` Mike O'Connor 0 siblings, 2 replies; 34+ messages in thread From: Nico Schottelius @ 2023-02-18 20:14 UTC (permalink / raw) To: WireGuard mailing list Dear group, I was wondering how wireguard [Linux kernel] or wireguard-go [FreeBSD] are supposed to decide which IP address to use for replying? I have seen both on FreeBSD and Linux that wireguard seems to use the IP address of the outgoing interface, i.e. the one with the route returning to the sender. However in multi homed situations, this can be wrong, let's take this example: 19:57:24.607526 net1 In IP 194.5.220.43.60770 > 147.78.195.254.51820: UDP, length 148 19:57:24.608358 net2 Out IP 195.141.200.73.51820 > 194.5.220.43.60770: UDP, length 92 The initiator sends from 194.5.220.43 to the receiver 147.78.195.254. Wireguard then replies with the source IP of 195.141.200.73 instead of 147.78.195.254. As the node is multi homed, the packet might leave through any of its uplinks and thus return with a random (unexpected) IP address and will not pass NAT rules on firewalls and finally be dropped. F.i. in above example the firewall drops the packet from 195.141.200.73, because there is no session entry for that. I have observed this behaviour both on Linux 6.1.11 as well as wireguard-go 0.0.20220316_8,1 on FreeBSD and in both cases the connection will break depending on which active interface is taken as exit. I would argue that wireguard should by default invert the IP addresses, i.e. switch dst=src, src=dst and then reply with that, instead of adapting an interface specific address, or is there a good reason for the current behaviour? Best regards, Nico -- Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <CAHx9msc1cNV80YU7HRmQ9gsjSEiVZ=pb31aYqfP62hy8DeuGZA@mail.gmail.com>]
* Re: Source IP incorrect on multi homed systems [not found] ` <CAHx9msc1cNV80YU7HRmQ9gsjSEiVZ=pb31aYqfP62hy8DeuGZA@mail.gmail.com> @ 2023-02-18 22:34 ` Nico Schottelius 0 siblings, 0 replies; 34+ messages in thread From: Nico Schottelius @ 2023-02-18 22:34 UTC (permalink / raw) To: Omkhar Arasaratnam; +Cc: Nico Schottelius, WireGuard mailing list Hello Omkhar, I tend to disagree. The problem is not the routing, but the selected source address, which is independent of routing. To be more specific: as there is BGP routing on all all interfaces, 147.78.195.254 is an accepted IP address on any interface. Best regards, Nico Omkhar Arasaratnam <omkhar@gmail.com> writes: > This looks like an asymmetric routing issue from what you’re describing, not a wireguard issue. > > You may want to look into policy based routing to address it. > > On Sat, Feb 18, 2023 at 15:54 Nico Schottelius <nico.schottelius@ungleich.ch> wrote: > > Dear group, > > I was wondering how wireguard [Linux kernel] or wireguard-go [FreeBSD] > are supposed to decide which IP address to use for replying? > > I have seen both on FreeBSD and Linux that wireguard seems to use the IP > address of the outgoing interface, i.e. the one with the route returning > to the sender. However in multi homed situations, this can be wrong, > let's take this example: > > 19:57:24.607526 net1 In IP 194.5.220.43.60770 > 147.78.195.254.51820: UDP, length 148 > 19:57:24.608358 net2 Out IP 195.141.200.73.51820 > 194.5.220.43.60770: UDP, length 92 > > The initiator sends from 194.5.220.43 to the receiver 147.78.195.254. > Wireguard then replies with the source IP of 195.141.200.73 instead of > 147.78.195.254. > > As the node is multi homed, the packet might leave through any of its > uplinks and thus return with a random (unexpected) IP address and will > not pass NAT rules on firewalls and finally be dropped. F.i. in above > example the firewall drops the packet from 195.141.200.73, because there > is no session entry for that. > > I have observed this behaviour both on Linux 6.1.11 as well as > wireguard-go 0.0.20220316_8,1 on FreeBSD and in both cases the > connection will break depending on which active interface is taken as > exit. > > I would argue that wireguard should by default invert the IP > addresses, i.e. switch dst=src, src=dst and then reply with that, > instead of adapting an interface specific address, or is there a good > reason for the current behaviour? > > Best regards, > > Nico > > -- > Sustainable and modern Infrastructures by ungleich.ch -- Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-18 20:14 Source IP incorrect on multi homed systems Nico Schottelius [not found] ` <CAHx9msc1cNV80YU7HRmQ9gsjSEiVZ=pb31aYqfP62hy8DeuGZA@mail.gmail.com> @ 2023-02-19 0:45 ` Mike O'Connor 2023-02-19 8:01 ` Nico Schottelius 1 sibling, 1 reply; 34+ messages in thread From: Mike O'Connor @ 2023-02-19 0:45 UTC (permalink / raw) To: Nico Schottelius, WireGuard mailing list Generally all OSs will if sending from a local process will use the address of the outgoing interface for the packet. If the packet is forwarded and no NAT is used the address will be routed via the interface suggested by the routing table. So local routing can be a real pain, policy based routing is an option. The other option could be to setup an 'output' NAT to an address which is multi-homed. I have a system running which is multi-homed with out issue other than the actual routing machine. This machine is BGP connected to three locations. There is no NAT setup and because I also add the wireguard link addresses to the BGP sessions. Cheers On 19/2/2023 6:44 am, Nico Schottelius wrote: > Dear group, > > I was wondering how wireguard [Linux kernel] or wireguard-go [FreeBSD] > are supposed to decide which IP address to use for replying? > > I have seen both on FreeBSD and Linux that wireguard seems to use the IP > address of the outgoing interface, i.e. the one with the route returning > to the sender. However in multi homed situations, this can be wrong, > let's take this example: > > 19:57:24.607526 net1 In IP 194.5.220.43.60770 > 147.78.195.254.51820: UDP, length 148 > 19:57:24.608358 net2 Out IP 195.141.200.73.51820 > 194.5.220.43.60770: UDP, length 92 > > The initiator sends from 194.5.220.43 to the receiver 147.78.195.254. > Wireguard then replies with the source IP of 195.141.200.73 instead of > 147.78.195.254. > > As the node is multi homed, the packet might leave through any of its > uplinks and thus return with a random (unexpected) IP address and will > not pass NAT rules on firewalls and finally be dropped. F.i. in above > example the firewall drops the packet from 195.141.200.73, because there > is no session entry for that. > > I have observed this behaviour both on Linux 6.1.11 as well as > wireguard-go 0.0.20220316_8,1 on FreeBSD and in both cases the > connection will break depending on which active interface is taken as > exit. > > I would argue that wireguard should by default invert the IP > addresses, i.e. switch dst=src, src=dst and then reply with that, > instead of adapting an interface specific address, or is there a good > reason for the current behaviour? > > Best regards, > > Nico > > -- > Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 0:45 ` Mike O'Connor @ 2023-02-19 8:01 ` Nico Schottelius 2023-02-19 9:19 ` Mikma ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Nico Schottelius @ 2023-02-19 8:01 UTC (permalink / raw) To: Mike O'Connor; +Cc: Nico Schottelius, WireGuard mailing list Let me rephrase the problem statement: - ping and http calls to the multi homed machine work correctly: I can ping 147.78.195.254 and the reply contains the same address. I can ping 195.141.200.73 and the reply contains the same address. I can curl 147.78.195.254 and the reply contains the same address. I can curl 195.141.200.73 and the reply contains the same address. - wireguard does NOT work because it changes the reply address: A packet sent to 147.78.195.254 is being replied with 195.141.200.73 In general, processes reply with the IP address that was used to contact them and not with the outgoing interface address, which would also break adding IP addresses to the loopback interface. For full detail, see ip addresses [0] and routing below [1] and tests executed [2]. I believe that this is a bug in wireguard. -------------------------------------------------------------------------------- [2] Let's see how it looks like in detail: 1) ping to 147.78.195.254: works [9:14] nb3:~% ping -c2 147.78.195.254 PING 147.78.195.254 (147.78.195.254) 56(84) bytes of data. 64 bytes from 147.78.195.254: icmp_seq=1 ttl=53 time=7.27 ms 64 bytes from 147.78.195.254: icmp_seq=2 ttl=53 time=6.30 ms --- 147.78.195.254 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1002ms rtt min/avg/max/mdev = 6.296/6.781/7.267/0.485 ms / # tcpdump -ni any host 194.5.220.43 tcpdump: data link type LINUX_SLL2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 08:14:48.379618 net1 In IP 194.5.220.43 > 147.78.195.254: ICMP echo request, id 89, seq 1, length 64 08:14:48.379651 net2 Out IP 147.78.195.254 > 194.5.220.43: ICMP echo reply, id 89, seq 1, length 64 08:14:49.380340 net1 In IP 194.5.220.43 > 147.78.195.254: ICMP echo request, id 89, seq 2, length 64 08:14:49.380392 net2 Out IP 147.78.195.254 > 194.5.220.43: ICMP echo reply, id 89, seq 2, length 64 2) ping to 195.141.200.73 [9:14] nb3:~% ping -c2 195.141.200.73 PING 195.141.200.73 (195.141.200.73) 56(84) bytes of data. 64 bytes from 195.141.200.73: icmp_seq=1 ttl=53 time=11.3 ms 64 bytes from 195.141.200.73: icmp_seq=2 ttl=53 time=6.81 ms --- 195.141.200.73 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1002ms rtt min/avg/max/mdev = 6.813/9.057/11.301/2.244 ms [9:15] nb3:~% / # tcpdump -ni any host 194.5.220.43 tcpdump: data link type LINUX_SLL2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 08:16:19.257697 net2 In IP 194.5.220.43 > 195.141.200.73: ICMP echo request, id 91, seq 1, length 64 08:16:19.257730 net2 Out IP 195.141.200.73 > 194.5.220.43: ICMP echo reply, id 91, seq 1, length 64 08:16:20.250948 net2 In IP 194.5.220.43 > 195.141.200.73: ICMP echo request, id 91, seq 2, length 64 08:16:20.250980 net2 Out IP 195.141.200.73 > 194.5.220.43: ICMP echo reply, id 91, seq 2, length 64 3) http to 147.78.195.254 [9:16] nb3:~% curl -s 147.78.195.254 > /dev/null ; echo $? 0 / # tcpdump -ni any host 194.5.220.43 tcpdump: data link type LINUX_SLL2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 08:17:04.082945 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [S], seq 1405408358, win 64240, options [mss 1460,sackOK,TS val 1380610701 ecr 0,nop,wscale 7], length 0 08:17:04.082983 net2 Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [S.], seq 3790092363, ack 1405408359, win 65160, options [mss 1460,sackOK,TS val 520503591 ecr 1380610701,nop,wscale 7], length 0 08:17:04.089996 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 1380610709 ecr 520503591], length 0 08:17:04.090121 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [P.], seq 1:79, ack 1, win 502, options [nop,nop,TS val 1380610709 ecr 520503591], length 78: HTTP: GET / HTTP/1.1 08:17:04.090136 net2 Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [.], ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 0 08:17:04.090301 net2 Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [P.], seq 1:239, ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 238: HTTP: HTTP/1.1 200 OK 08:17:04.090381 net2 Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [P.], seq 239:854, ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 615: HTTP 08:17:04.096058 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 239, win 501, options [nop,nop,TS val 1380610715 ecr 520503598], length 0 08:17:04.096059 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 854, win 497, options [nop,nop,TS val 1380610715 ecr 520503598], length 0 08:17:04.096339 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [F.], seq 79, ack 854, win 501, options [nop,nop,TS val 1380610715 ecr 520503598], length 0 08:17:04.096450 net2 Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [F.], seq 854, ack 80, win 509, options [nop,nop,TS val 520503604 ecr 1380610715], length 0 08:17:04.102609 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 855, win 501, options [nop,nop,TS val 1380610721 ecr 520503604], length 0 4) http to 195.141.200.73 [9:17] nb3:~% curl -s 195.141.200.73 > /dev/null ; echo $? 0 / # tcpdump -ni any host 194.5.220.43 tcpdump: data link type LINUX_SLL2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 08:18:05.951066 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [S], seq 1556080700, win 64240, options [mss 1460,sackOK,TS val 765965336 ecr 0,nop,wscale 7], length 0 08:18:05.951106 net2 Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [S.], seq 3465881361, ack 1556080701, win 65160, options [mss 1460,sackOK,TS val 3168643538 ecr 765965336,nop,wscale 7], length 0 08:18:05.958699 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 765965342 ecr 3168643538], length 0 08:18:05.958749 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [P.], seq 1:79, ack 1, win 502, options [nop,nop,TS val 765965342 ecr 3168643538], length 78: HTTP: GET / HTTP/1.1 08:18:05.958763 net2 Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [.], ack 79, win 509, options [nop,nop,TS val 3168643545 ecr 765965342], length 0 08:18:05.959216 net2 Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [P.], seq 1:239, ack 79, win 509, options [nop,nop,TS val 3168643546 ecr 765965342], length 238: HTTP: HTTP/1.1 200 OK 08:18:05.959327 net2 Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [P.], seq 239:854, ack 79, win 509, options [nop,nop,TS val 3168643546 ecr 765965342], length 615: HTTP 08:18:05.965244 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 239, win 501, options [nop,nop,TS val 765965350 ecr 3168643546], length 0 08:18:05.965348 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 854, win 497, options [nop,nop,TS val 765965350 ecr 3168643546], length 0 08:18:05.965487 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [F.], seq 79, ack 854, win 501, options [nop,nop,TS val 765965350 ecr 3168643546], length 0 08:18:05.965573 net2 Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [F.], seq 854, ack 80, win 509, options [nop,nop,TS val 3168643552 ecr 765965350], length 0 08:18:05.971916 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 855, win 501, options [nop,nop,TS val 765965356 ecr 3168643552], length 0 [0] wireguard "server" that changes the source ip: / # ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 3: eth0@if29: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP link/ether 66:4a:9c:12:5b:6c brd ff:ff:ff:ff:ff:ff inet6 2a0a:e5c0:10:1e:7f21:83ca:a7d:46d2/128 scope global valid_lft forever preferred_lft forever inet6 fe80::644a:9cff:fe12:5b6c/64 scope link valid_lft forever preferred_lft forever 4: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 3c:ec:ef:cb:d8:1b brd ff:ff:ff:ff:ff:ff inet 147.78.195.254/27 brd 147.78.195.255 scope global net1 valid_lft forever preferred_lft forever inet6 2a0a:e5c0:1:8::53/64 scope global valid_lft forever preferred_lft forever inet6 fe80::3eec:efff:fecb:d81b/64 scope link valid_lft forever preferred_lft forever 5: v1477819464: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN qlen 1000 link/[65534] inet 147.78.194.65/26 scope global v1477819464 valid_lft forever preferred_lft forever inet6 2a0a:e5c0:2e::1/64 scope global valid_lft forever preferred_lft forever 26: net2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 3c:ec:ef:cb:d8:1c brd ff:ff:ff:ff:ff:ff inet 195.141.200.73/31 scope global net2 valid_lft forever preferred_lft forever inet6 2001:1700:3500:2::12/124 scope global valid_lft forever preferred_lft forever inet6 fe80::3eec:efff:fecb:d81c/64 scope link valid_lft forever preferred_lft forever / # wireguard client behind nat: nb3:/etc/wireguard# curl -4 ifconfig.io 194.5.220.43 nb3:/etc/wireguard# ip a sh dev wlan0 2: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 84:5c:f3:ed:52:9c brd ff:ff:ff:ff:ff:ff inet 192.168.4.85/24 brd 192.168.4.255 scope global dynamic noprefixroute wlan0 valid_lft 317sec preferred_lft 242sec inet6 2a0a:e5c0:13:0:865c:f3ff:feed:529c/64 scope global dynamic mngtmpaddr noprefixroute valid_lft 86394sec preferred_lft 14394sec inet6 fe80::865c:f3ff:feed:529c/64 scope link valid_lft forever preferred_lft forever nb3:/etc/wireguard# [1] / # ip route get 194.5.220.43 194.5.220.43 via 195.141.200.72 dev net2 src 195.141.200.73 / # Mike O'Connor <mike@pineview.net> writes: > Generally all OSs will if sending from a local process will use the > address of the outgoing interface for the packet. > > If the packet is forwarded and no NAT is used the address will be > routed via the interface suggested by the routing table. > > So local routing can be a real pain, policy based routing is an > option. The other option could be to setup an 'output' NAT to an > address which is multi-homed. > > I have a system running which is multi-homed with out issue other than > the actual routing machine. This machine is BGP connected to three > locations. > > There is no NAT setup and because I also add the wireguard link > addresses to the BGP sessions. > > Cheers > > > > On 19/2/2023 6:44 am, Nico Schottelius wrote: >> Dear group, >> >> I was wondering how wireguard [Linux kernel] or wireguard-go [FreeBSD] >> are supposed to decide which IP address to use for replying? >> >> I have seen both on FreeBSD and Linux that wireguard seems to use the IP >> address of the outgoing interface, i.e. the one with the route returning >> to the sender. However in multi homed situations, this can be wrong, >> let's take this example: >> >> 19:57:24.607526 net1 In IP 194.5.220.43.60770 > 147.78.195.254.51820: UDP, length 148 >> 19:57:24.608358 net2 Out IP 195.141.200.73.51820 > 194.5.220.43.60770: UDP, length 92 >> >> The initiator sends from 194.5.220.43 to the receiver 147.78.195.254. >> Wireguard then replies with the source IP of 195.141.200.73 instead of >> 147.78.195.254. >> >> As the node is multi homed, the packet might leave through any of its >> uplinks and thus return with a random (unexpected) IP address and will >> not pass NAT rules on firewalls and finally be dropped. F.i. in above >> example the firewall drops the packet from 195.141.200.73, because there >> is no session entry for that. >> >> I have observed this behaviour both on Linux 6.1.11 as well as >> wireguard-go 0.0.20220316_8,1 on FreeBSD and in both cases the >> connection will break depending on which active interface is taken as >> exit. >> >> I would argue that wireguard should by default invert the IP >> addresses, i.e. switch dst=src, src=dst and then reply with that, >> instead of adapting an interface specific address, or is there a good >> reason for the current behaviour? >> >> Best regards, >> >> Nico >> >> -- >> Sustainable and modern Infrastructures by ungleich.ch -- Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 8:01 ` Nico Schottelius @ 2023-02-19 9:19 ` Mikma 2023-02-19 12:04 ` Nico Schottelius 2023-02-19 12:10 ` Nico Schottelius [not found] ` <2ed829aaed9fec59ac2a9b32c4ce0a9005b8d8b850be81c81a226791855fe4eb@mu.id> 2 siblings, 1 reply; 34+ messages in thread From: Mikma @ 2023-02-19 9:19 UTC (permalink / raw) To: wireguard, Nico Schottelius, Mike O'Connor; +Cc: WireGuard mailing list Have you tried setting the preferred src address of the route(s) to the addresses you desire? From "man ip": > src ADDRESS the source address to prefer when sending to the destinations covered by the route prefix. On 19 February 2023 09:01:31 CET, Nico Schottelius <nico.schottelius@ungleich.ch> wrote: > >Let me rephrase the problem statement: > > - ping and http calls to the multi homed machine work correctly: > I can ping 147.78.195.254 and the reply contains the same address. > I can ping 195.141.200.73 and the reply contains the same address. > I can curl 147.78.195.254 and the reply contains the same address. > I can curl 195.141.200.73 and the reply contains the same address. > > - wireguard does NOT work because it changes the reply address: > A packet sent to 147.78.195.254 is being replied with 195.141.200.73 > >In general, processes reply with the IP address that was used to contact >them and not with the outgoing interface address, which would also break >adding IP addresses to the loopback interface. > >For full detail, see ip addresses [0] and routing below [1] and tests >executed [2]. > >I believe that this is a bug in wireguard. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 9:19 ` Mikma @ 2023-02-19 12:04 ` Nico Schottelius 0 siblings, 0 replies; 34+ messages in thread From: Nico Schottelius @ 2023-02-19 12:04 UTC (permalink / raw) To: Mikma; +Cc: Nico Schottelius, Mike O'Connor, WireGuard mailing list Hello Mikma, Mikma <mikma.wg@lists.m7n.se> writes: > Have you tried setting the preferred src address of the route(s) to the addresses you desire? > > From "man ip": > >> src ADDRESS the source address to prefer when sending to the destinations covered by the route prefix. unfortunately this does not solve the problem. The expected behaviour of wireguard is to reply with the same IP address, like nginx and the kernel ICMP handler do, not with a route based outgoing interface IP address. In a BGP based environment the route can vary dynamically and I showed a stripped down version to make it easier to understand. In practices, many of our systems have 4-7 different upstreams and the packet can come in on any interface and should leave the machine on the current correct interface depending on the route import. In no case however, wireguard should change the response address, because this breaks stateful firewalls. As demonstrated in my last email, both the in-kernel ICMP handler as well as user space applications like nginx behave correctly on the same machine. I briefly checked the wireguard source code and I did not right away spot the network handling part that sets the source IP, so I am wondering if this bug is due to wireguard not handling it at all? Best regards, Nico -- Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 8:01 ` Nico Schottelius 2023-02-19 9:19 ` Mikma @ 2023-02-19 12:10 ` Nico Schottelius 2023-02-19 18:59 ` Peter Linder [not found] ` <2ed829aaed9fec59ac2a9b32c4ce0a9005b8d8b850be81c81a226791855fe4eb@mu.id> 2 siblings, 1 reply; 34+ messages in thread From: Nico Schottelius @ 2023-02-19 12:10 UTC (permalink / raw) To: Nico Schottelius; +Cc: Mike O'Connor, WireGuard mailing list Aside from nginx + icmp being handled correctly as a reference, I want to further elaborate on this case to show that something is really wrong with the current behaviour: A typical scenario for routers is to have a lot of global reachable IP addresses (IPv6, IPv4) assigned to the loopback interface, such as this system: [13:11] router2.place6:~# ip a sh dev lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 2a0a:e5c0:1e:a::b/128 scope global valid_lft forever preferred_lft forever inet6 2a0a:e5c0:1e:a::a/128 scope global valid_lft forever preferred_lft forever inet6 2a0a:e5c0:2:a::b/128 scope global valid_lft forever preferred_lft forever inet6 2a0a:e5c0:2:a::a/128 scope global valid_lft forever preferred_lft forever inet6 2a0a:e5c0:2:1::7/128 scope global valid_lft forever preferred_lft forever inet6 2a0a:e5c0:2:1::6/128 scope global valid_lft forever preferred_lft forever inet6 2a0a:e5c0:2:1::5/128 scope global valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever The motivation behind that is that independent of the actual routing interface, these IP addresses are always reachable. Now in the case of wireguard selecting the source IP based on the outgoing interface, this is never going to work, as lo cannot send packets to the outside world. Nico Schottelius <nico.schottelius@ungleich.ch> writes: > Let me rephrase the problem statement: > > - ping and http calls to the multi homed machine work correctly: > I can ping 147.78.195.254 and the reply contains the same address. > I can ping 195.141.200.73 and the reply contains the same address. > I can curl 147.78.195.254 and the reply contains the same address. > I can curl 195.141.200.73 and the reply contains the same address. > > - wireguard does NOT work because it changes the reply address: > A packet sent to 147.78.195.254 is being replied with 195.141.200.73 > > In general, processes reply with the IP address that was used to contact > them and not with the outgoing interface address, which would also break > adding IP addresses to the loopback interface. > > For full detail, see ip addresses [0] and routing below [1] and tests > executed [2]. > > I believe that this is a bug in wireguard. > > -------------------------------------------------------------------------------- > > [2] > > Let's see how it looks like in detail: > > 1) ping to 147.78.195.254: works > > [9:14] nb3:~% ping -c2 147.78.195.254 > PING 147.78.195.254 (147.78.195.254) 56(84) bytes of data. > 64 bytes from 147.78.195.254: icmp_seq=1 ttl=53 time=7.27 ms > 64 bytes from 147.78.195.254: icmp_seq=2 ttl=53 time=6.30 ms > > --- 147.78.195.254 ping statistics --- > 2 packets transmitted, 2 received, 0% packet loss, time 1002ms > rtt min/avg/max/mdev = 6.296/6.781/7.267/0.485 ms > > / # tcpdump -ni any host 194.5.220.43 > tcpdump: data link type LINUX_SLL2 > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode > listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes > 08:14:48.379618 net1 In IP 194.5.220.43 > 147.78.195.254: ICMP echo request, id 89, seq 1, length 64 > 08:14:48.379651 net2 Out IP 147.78.195.254 > 194.5.220.43: ICMP echo reply, id 89, seq 1, length 64 > 08:14:49.380340 net1 In IP 194.5.220.43 > 147.78.195.254: ICMP echo request, id 89, seq 2, length 64 > 08:14:49.380392 net2 Out IP 147.78.195.254 > 194.5.220.43: ICMP echo reply, id 89, seq 2, length 64 > > 2) ping to 195.141.200.73 > > [9:14] nb3:~% ping -c2 195.141.200.73 > PING 195.141.200.73 (195.141.200.73) 56(84) bytes of data. > 64 bytes from 195.141.200.73: icmp_seq=1 ttl=53 time=11.3 ms > 64 bytes from 195.141.200.73: icmp_seq=2 ttl=53 time=6.81 ms > > --- 195.141.200.73 ping statistics --- > 2 packets transmitted, 2 received, 0% packet loss, time 1002ms > rtt min/avg/max/mdev = 6.813/9.057/11.301/2.244 ms > [9:15] nb3:~% > / # tcpdump -ni any host 194.5.220.43 > tcpdump: data link type LINUX_SLL2 > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode > listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes > 08:16:19.257697 net2 In IP 194.5.220.43 > 195.141.200.73: ICMP echo request, id 91, seq 1, length 64 > 08:16:19.257730 net2 Out IP 195.141.200.73 > 194.5.220.43: ICMP echo reply, id 91, seq 1, length 64 > 08:16:20.250948 net2 In IP 194.5.220.43 > 195.141.200.73: ICMP echo request, id 91, seq 2, length 64 > 08:16:20.250980 net2 Out IP 195.141.200.73 > 194.5.220.43: ICMP echo reply, id 91, seq 2, length 64 > > 3) http to 147.78.195.254 > > [9:16] nb3:~% curl -s 147.78.195.254 > /dev/null ; echo $? > 0 > / # tcpdump -ni any host 194.5.220.43 > tcpdump: data link type LINUX_SLL2 > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode > listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes > 08:17:04.082945 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [S], seq 1405408358, win 64240, options [mss 1460,sackOK,TS val 1380610701 ecr 0,nop,wscale 7], length 0 > 08:17:04.082983 net2 Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [S.], seq 3790092363, ack 1405408359, win 65160, options [mss 1460,sackOK,TS val 520503591 ecr 1380610701,nop,wscale 7], length 0 > 08:17:04.089996 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 1380610709 ecr 520503591], length 0 > 08:17:04.090121 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [P.], seq 1:79, ack 1, win 502, options [nop,nop,TS val 1380610709 ecr 520503591], length 78: HTTP: GET / HTTP/1.1 > 08:17:04.090136 net2 Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [.], ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 0 > 08:17:04.090301 net2 Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [P.], seq 1:239, ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 238: HTTP: HTTP/1.1 200 OK > 08:17:04.090381 net2 Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [P.], seq 239:854, ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 615: HTTP > 08:17:04.096058 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 239, win 501, options [nop,nop,TS val 1380610715 ecr 520503598], length 0 > 08:17:04.096059 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 854, win 497, options [nop,nop,TS val 1380610715 ecr 520503598], length 0 > 08:17:04.096339 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [F.], seq 79, ack 854, win 501, options [nop,nop,TS val 1380610715 ecr 520503598], length 0 > 08:17:04.096450 net2 Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [F.], seq 854, ack 80, win 509, options [nop,nop,TS val 520503604 ecr 1380610715], length 0 > 08:17:04.102609 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 855, win 501, options [nop,nop,TS val 1380610721 ecr 520503604], length 0 > > > 4) http to 195.141.200.73 > > [9:17] nb3:~% curl -s 195.141.200.73 > /dev/null ; echo $? > 0 > > / # tcpdump -ni any host 194.5.220.43 > tcpdump: data link type LINUX_SLL2 > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode > listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes > 08:18:05.951066 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [S], seq 1556080700, win 64240, options [mss 1460,sackOK,TS val 765965336 ecr 0,nop,wscale 7], length 0 > 08:18:05.951106 net2 Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [S.], seq 3465881361, ack 1556080701, win 65160, options [mss 1460,sackOK,TS val 3168643538 ecr 765965336,nop,wscale 7], length 0 > 08:18:05.958699 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 765965342 ecr 3168643538], length 0 > 08:18:05.958749 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [P.], seq 1:79, ack 1, win 502, options [nop,nop,TS val 765965342 ecr 3168643538], length 78: HTTP: GET / HTTP/1.1 > 08:18:05.958763 net2 Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [.], ack 79, win 509, options [nop,nop,TS val 3168643545 ecr 765965342], length 0 > 08:18:05.959216 net2 Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [P.], seq 1:239, ack 79, win 509, options [nop,nop,TS val 3168643546 ecr 765965342], length 238: HTTP: HTTP/1.1 200 OK > 08:18:05.959327 net2 Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [P.], seq 239:854, ack 79, win 509, options [nop,nop,TS val 3168643546 ecr 765965342], length 615: HTTP > 08:18:05.965244 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 239, win 501, options [nop,nop,TS val 765965350 ecr 3168643546], length 0 > 08:18:05.965348 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 854, win 497, options [nop,nop,TS val 765965350 ecr 3168643546], length 0 > 08:18:05.965487 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [F.], seq 79, ack 854, win 501, options [nop,nop,TS val 765965350 ecr 3168643546], length 0 > 08:18:05.965573 net2 Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [F.], seq 854, ack 80, win 509, options [nop,nop,TS val 3168643552 ecr 765965350], length 0 > 08:18:05.971916 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 855, win 501, options [nop,nop,TS val 765965356 ecr 3168643552], length 0 > > > > [0] > wireguard "server" that changes the source ip: > > / # ip a > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000 > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > inet 127.0.0.1/8 scope host lo > valid_lft forever preferred_lft forever > inet6 ::1/128 scope host > valid_lft forever preferred_lft forever > 3: eth0@if29: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP > link/ether 66:4a:9c:12:5b:6c brd ff:ff:ff:ff:ff:ff > inet6 2a0a:e5c0:10:1e:7f21:83ca:a7d:46d2/128 scope global > valid_lft forever preferred_lft forever > inet6 fe80::644a:9cff:fe12:5b6c/64 scope link > valid_lft forever preferred_lft forever > 4: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 > link/ether 3c:ec:ef:cb:d8:1b brd ff:ff:ff:ff:ff:ff > inet 147.78.195.254/27 brd 147.78.195.255 scope global net1 > valid_lft forever preferred_lft forever > inet6 2a0a:e5c0:1:8::53/64 scope global > valid_lft forever preferred_lft forever > inet6 fe80::3eec:efff:fecb:d81b/64 scope link > valid_lft forever preferred_lft forever > 5: v1477819464: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN qlen 1000 > link/[65534] > inet 147.78.194.65/26 scope global v1477819464 > valid_lft forever preferred_lft forever > inet6 2a0a:e5c0:2e::1/64 scope global > valid_lft forever preferred_lft forever > 26: net2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 > link/ether 3c:ec:ef:cb:d8:1c brd ff:ff:ff:ff:ff:ff > inet 195.141.200.73/31 scope global net2 > valid_lft forever preferred_lft forever > inet6 2001:1700:3500:2::12/124 scope global > valid_lft forever preferred_lft forever > inet6 fe80::3eec:efff:fecb:d81c/64 scope link > valid_lft forever preferred_lft forever > / # > > wireguard client behind nat: > > nb3:/etc/wireguard# curl -4 ifconfig.io > 194.5.220.43 > nb3:/etc/wireguard# ip a sh dev wlan0 > 2: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 > link/ether 84:5c:f3:ed:52:9c brd ff:ff:ff:ff:ff:ff > inet 192.168.4.85/24 brd 192.168.4.255 scope global dynamic noprefixroute wlan0 > valid_lft 317sec preferred_lft 242sec > inet6 2a0a:e5c0:13:0:865c:f3ff:feed:529c/64 scope global dynamic mngtmpaddr noprefixroute > valid_lft 86394sec preferred_lft 14394sec > inet6 fe80::865c:f3ff:feed:529c/64 scope link > valid_lft forever preferred_lft forever > nb3:/etc/wireguard# > > > [1] > / # ip route get 194.5.220.43 > 194.5.220.43 via 195.141.200.72 dev net2 src 195.141.200.73 > / # > > > Mike O'Connor <mike@pineview.net> writes: > >> Generally all OSs will if sending from a local process will use the >> address of the outgoing interface for the packet. >> >> If the packet is forwarded and no NAT is used the address will be >> routed via the interface suggested by the routing table. >> >> So local routing can be a real pain, policy based routing is an >> option. The other option could be to setup an 'output' NAT to an >> address which is multi-homed. >> >> I have a system running which is multi-homed with out issue other than >> the actual routing machine. This machine is BGP connected to three >> locations. >> >> There is no NAT setup and because I also add the wireguard link >> addresses to the BGP sessions. >> >> Cheers >> >> >> >> On 19/2/2023 6:44 am, Nico Schottelius wrote: >>> Dear group, >>> >>> I was wondering how wireguard [Linux kernel] or wireguard-go [FreeBSD] >>> are supposed to decide which IP address to use for replying? >>> >>> I have seen both on FreeBSD and Linux that wireguard seems to use the IP >>> address of the outgoing interface, i.e. the one with the route returning >>> to the sender. However in multi homed situations, this can be wrong, >>> let's take this example: >>> >>> 19:57:24.607526 net1 In IP 194.5.220.43.60770 > 147.78.195.254.51820: UDP, length 148 >>> 19:57:24.608358 net2 Out IP 195.141.200.73.51820 > 194.5.220.43.60770: UDP, length 92 >>> >>> The initiator sends from 194.5.220.43 to the receiver 147.78.195.254. >>> Wireguard then replies with the source IP of 195.141.200.73 instead of >>> 147.78.195.254. >>> >>> As the node is multi homed, the packet might leave through any of its >>> uplinks and thus return with a random (unexpected) IP address and will >>> not pass NAT rules on firewalls and finally be dropped. F.i. in above >>> example the firewall drops the packet from 195.141.200.73, because there >>> is no session entry for that. >>> >>> I have observed this behaviour both on Linux 6.1.11 as well as >>> wireguard-go 0.0.20220316_8,1 on FreeBSD and in both cases the >>> connection will break depending on which active interface is taken as >>> exit. >>> >>> I would argue that wireguard should by default invert the IP >>> addresses, i.e. switch dst=src, src=dst and then reply with that, >>> instead of adapting an interface specific address, or is there a good >>> reason for the current behaviour? >>> >>> Best regards, >>> >>> Nico >>> >>> -- >>> Sustainable and modern Infrastructures by ungleich.ch -- Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 12:10 ` Nico Schottelius @ 2023-02-19 18:59 ` Peter Linder 0 siblings, 0 replies; 34+ messages in thread From: Peter Linder @ 2023-02-19 18:59 UTC (permalink / raw) To: wireguard Indeed this is how you typically set up a multihomed service (addresses on lo and then announce that using BGP or something). If you use one of the network links directly for the service and that link network goes down (it may not even be in your AS so you may not know?) then the service is offline. use a route-map in your bgp config to set the src address of routes to the address on lo, that works for wg :) /Peter On 2023-02-19 13:10, Nico Schottelius wrote: > Aside from nginx + icmp being handled correctly as a reference, > I want to further elaborate on this case to show that something is > really wrong with the current behaviour: > > A typical scenario for routers is to have a lot of global reachable IP > addresses (IPv6, IPv4) assigned to the loopback interface, such as this > system: > > [13:11] router2.place6:~# ip a sh dev lo > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > inet 127.0.0.1/8 scope host lo > valid_lft forever preferred_lft forever > inet6 2a0a:e5c0:1e:a::b/128 scope global > valid_lft forever preferred_lft forever > inet6 2a0a:e5c0:1e:a::a/128 scope global > valid_lft forever preferred_lft forever > inet6 2a0a:e5c0:2:a::b/128 scope global > valid_lft forever preferred_lft forever > inet6 2a0a:e5c0:2:a::a/128 scope global > valid_lft forever preferred_lft forever > inet6 2a0a:e5c0:2:1::7/128 scope global > valid_lft forever preferred_lft forever > inet6 2a0a:e5c0:2:1::6/128 scope global > valid_lft forever preferred_lft forever > inet6 2a0a:e5c0:2:1::5/128 scope global > valid_lft forever preferred_lft forever > inet6 ::1/128 scope host > valid_lft forever preferred_lft forever > > The motivation behind that is that independent of the actual routing > interface, these IP addresses are always reachable. > > Now in the case of wireguard selecting the source IP based on the > outgoing interface, this is never going to work, as lo cannot send > packets to the outside world. > > > Nico Schottelius <nico.schottelius@ungleich.ch> writes: > >> Let me rephrase the problem statement: >> >> - ping and http calls to the multi homed machine work correctly: >> I can ping 147.78.195.254 and the reply contains the same address. >> I can ping 195.141.200.73 and the reply contains the same address. >> I can curl 147.78.195.254 and the reply contains the same address. >> I can curl 195.141.200.73 and the reply contains the same address. >> >> - wireguard does NOT work because it changes the reply address: >> A packet sent to 147.78.195.254 is being replied with 195.141.200.73 >> >> In general, processes reply with the IP address that was used to contact >> them and not with the outgoing interface address, which would also break >> adding IP addresses to the loopback interface. >> >> For full detail, see ip addresses [0] and routing below [1] and tests >> executed [2]. >> >> I believe that this is a bug in wireguard. >> >> -------------------------------------------------------------------------------- >> >> [2] >> >> Let's see how it looks like in detail: >> >> 1) ping to 147.78.195.254: works >> >> [9:14] nb3:~% ping -c2 147.78.195.254 >> PING 147.78.195.254 (147.78.195.254) 56(84) bytes of data. >> 64 bytes from 147.78.195.254: icmp_seq=1 ttl=53 time=7.27 ms >> 64 bytes from 147.78.195.254: icmp_seq=2 ttl=53 time=6.30 ms >> >> --- 147.78.195.254 ping statistics --- >> 2 packets transmitted, 2 received, 0% packet loss, time 1002ms >> rtt min/avg/max/mdev = 6.296/6.781/7.267/0.485 ms >> >> / # tcpdump -ni any host 194.5.220.43 >> tcpdump: data link type LINUX_SLL2 >> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode >> listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes >> 08:14:48.379618 net1 In IP 194.5.220.43 > 147.78.195.254: ICMP echo request, id 89, seq 1, length 64 >> 08:14:48.379651 net2 Out IP 147.78.195.254 > 194.5.220.43: ICMP echo reply, id 89, seq 1, length 64 >> 08:14:49.380340 net1 In IP 194.5.220.43 > 147.78.195.254: ICMP echo request, id 89, seq 2, length 64 >> 08:14:49.380392 net2 Out IP 147.78.195.254 > 194.5.220.43: ICMP echo reply, id 89, seq 2, length 64 >> >> 2) ping to 195.141.200.73 >> >> [9:14] nb3:~% ping -c2 195.141.200.73 >> PING 195.141.200.73 (195.141.200.73) 56(84) bytes of data. >> 64 bytes from 195.141.200.73: icmp_seq=1 ttl=53 time=11.3 ms >> 64 bytes from 195.141.200.73: icmp_seq=2 ttl=53 time=6.81 ms >> >> --- 195.141.200.73 ping statistics --- >> 2 packets transmitted, 2 received, 0% packet loss, time 1002ms >> rtt min/avg/max/mdev = 6.813/9.057/11.301/2.244 ms >> [9:15] nb3:~% >> / # tcpdump -ni any host 194.5.220.43 >> tcpdump: data link type LINUX_SLL2 >> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode >> listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes >> 08:16:19.257697 net2 In IP 194.5.220.43 > 195.141.200.73: ICMP echo request, id 91, seq 1, length 64 >> 08:16:19.257730 net2 Out IP 195.141.200.73 > 194.5.220.43: ICMP echo reply, id 91, seq 1, length 64 >> 08:16:20.250948 net2 In IP 194.5.220.43 > 195.141.200.73: ICMP echo request, id 91, seq 2, length 64 >> 08:16:20.250980 net2 Out IP 195.141.200.73 > 194.5.220.43: ICMP echo reply, id 91, seq 2, length 64 >> >> 3) http to 147.78.195.254 >> >> [9:16] nb3:~% curl -s 147.78.195.254 > /dev/null ; echo $? >> 0 >> / # tcpdump -ni any host 194.5.220.43 >> tcpdump: data link type LINUX_SLL2 >> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode >> listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes >> 08:17:04.082945 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [S], seq 1405408358, win 64240, options [mss 1460,sackOK,TS val 1380610701 ecr 0,nop,wscale 7], length 0 >> 08:17:04.082983 net2 Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [S.], seq 3790092363, ack 1405408359, win 65160, options [mss 1460,sackOK,TS val 520503591 ecr 1380610701,nop,wscale 7], length 0 >> 08:17:04.089996 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 1380610709 ecr 520503591], length 0 >> 08:17:04.090121 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [P.], seq 1:79, ack 1, win 502, options [nop,nop,TS val 1380610709 ecr 520503591], length 78: HTTP: GET / HTTP/1.1 >> 08:17:04.090136 net2 Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [.], ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 0 >> 08:17:04.090301 net2 Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [P.], seq 1:239, ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 238: HTTP: HTTP/1.1 200 OK >> 08:17:04.090381 net2 Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [P.], seq 239:854, ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 615: HTTP >> 08:17:04.096058 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 239, win 501, options [nop,nop,TS val 1380610715 ecr 520503598], length 0 >> 08:17:04.096059 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 854, win 497, options [nop,nop,TS val 1380610715 ecr 520503598], length 0 >> 08:17:04.096339 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [F.], seq 79, ack 854, win 501, options [nop,nop,TS val 1380610715 ecr 520503598], length 0 >> 08:17:04.096450 net2 Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [F.], seq 854, ack 80, win 509, options [nop,nop,TS val 520503604 ecr 1380610715], length 0 >> 08:17:04.102609 net1 In IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 855, win 501, options [nop,nop,TS val 1380610721 ecr 520503604], length 0 >> >> >> 4) http to 195.141.200.73 >> >> [9:17] nb3:~% curl -s 195.141.200.73 > /dev/null ; echo $? >> 0 >> >> / # tcpdump -ni any host 194.5.220.43 >> tcpdump: data link type LINUX_SLL2 >> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode >> listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes >> 08:18:05.951066 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [S], seq 1556080700, win 64240, options [mss 1460,sackOK,TS val 765965336 ecr 0,nop,wscale 7], length 0 >> 08:18:05.951106 net2 Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [S.], seq 3465881361, ack 1556080701, win 65160, options [mss 1460,sackOK,TS val 3168643538 ecr 765965336,nop,wscale 7], length 0 >> 08:18:05.958699 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 765965342 ecr 3168643538], length 0 >> 08:18:05.958749 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [P.], seq 1:79, ack 1, win 502, options [nop,nop,TS val 765965342 ecr 3168643538], length 78: HTTP: GET / HTTP/1.1 >> 08:18:05.958763 net2 Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [.], ack 79, win 509, options [nop,nop,TS val 3168643545 ecr 765965342], length 0 >> 08:18:05.959216 net2 Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [P.], seq 1:239, ack 79, win 509, options [nop,nop,TS val 3168643546 ecr 765965342], length 238: HTTP: HTTP/1.1 200 OK >> 08:18:05.959327 net2 Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [P.], seq 239:854, ack 79, win 509, options [nop,nop,TS val 3168643546 ecr 765965342], length 615: HTTP >> 08:18:05.965244 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 239, win 501, options [nop,nop,TS val 765965350 ecr 3168643546], length 0 >> 08:18:05.965348 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 854, win 497, options [nop,nop,TS val 765965350 ecr 3168643546], length 0 >> 08:18:05.965487 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [F.], seq 79, ack 854, win 501, options [nop,nop,TS val 765965350 ecr 3168643546], length 0 >> 08:18:05.965573 net2 Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [F.], seq 854, ack 80, win 509, options [nop,nop,TS val 3168643552 ecr 765965350], length 0 >> 08:18:05.971916 net2 In IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 855, win 501, options [nop,nop,TS val 765965356 ecr 3168643552], length 0 >> >> >> >> [0] >> wireguard "server" that changes the source ip: >> >> / # ip a >> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000 >> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >> inet 127.0.0.1/8 scope host lo >> valid_lft forever preferred_lft forever >> inet6 ::1/128 scope host >> valid_lft forever preferred_lft forever >> 3: eth0@if29: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP >> link/ether 66:4a:9c:12:5b:6c brd ff:ff:ff:ff:ff:ff >> inet6 2a0a:e5c0:10:1e:7f21:83ca:a7d:46d2/128 scope global >> valid_lft forever preferred_lft forever >> inet6 fe80::644a:9cff:fe12:5b6c/64 scope link >> valid_lft forever preferred_lft forever >> 4: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 >> link/ether 3c:ec:ef:cb:d8:1b brd ff:ff:ff:ff:ff:ff >> inet 147.78.195.254/27 brd 147.78.195.255 scope global net1 >> valid_lft forever preferred_lft forever >> inet6 2a0a:e5c0:1:8::53/64 scope global >> valid_lft forever preferred_lft forever >> inet6 fe80::3eec:efff:fecb:d81b/64 scope link >> valid_lft forever preferred_lft forever >> 5: v1477819464: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN qlen 1000 >> link/[65534] >> inet 147.78.194.65/26 scope global v1477819464 >> valid_lft forever preferred_lft forever >> inet6 2a0a:e5c0:2e::1/64 scope global >> valid_lft forever preferred_lft forever >> 26: net2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 >> link/ether 3c:ec:ef:cb:d8:1c brd ff:ff:ff:ff:ff:ff >> inet 195.141.200.73/31 scope global net2 >> valid_lft forever preferred_lft forever >> inet6 2001:1700:3500:2::12/124 scope global >> valid_lft forever preferred_lft forever >> inet6 fe80::3eec:efff:fecb:d81c/64 scope link >> valid_lft forever preferred_lft forever >> / # >> >> wireguard client behind nat: >> >> nb3:/etc/wireguard# curl -4 ifconfig.io >> 194.5.220.43 >> nb3:/etc/wireguard# ip a sh dev wlan0 >> 2: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 >> link/ether 84:5c:f3:ed:52:9c brd ff:ff:ff:ff:ff:ff >> inet 192.168.4.85/24 brd 192.168.4.255 scope global dynamic noprefixroute wlan0 >> valid_lft 317sec preferred_lft 242sec >> inet6 2a0a:e5c0:13:0:865c:f3ff:feed:529c/64 scope global dynamic mngtmpaddr noprefixroute >> valid_lft 86394sec preferred_lft 14394sec >> inet6 fe80::865c:f3ff:feed:529c/64 scope link >> valid_lft forever preferred_lft forever >> nb3:/etc/wireguard# >> >> >> [1] >> / # ip route get 194.5.220.43 >> 194.5.220.43 via 195.141.200.72 dev net2 src 195.141.200.73 >> / # >> >> >> Mike O'Connor <mike@pineview.net> writes: >> >>> Generally all OSs will if sending from a local process will use the >>> address of the outgoing interface for the packet. >>> >>> If the packet is forwarded and no NAT is used the address will be >>> routed via the interface suggested by the routing table. >>> >>> So local routing can be a real pain, policy based routing is an >>> option. The other option could be to setup an 'output' NAT to an >>> address which is multi-homed. >>> >>> I have a system running which is multi-homed with out issue other than >>> the actual routing machine. This machine is BGP connected to three >>> locations. >>> >>> There is no NAT setup and because I also add the wireguard link >>> addresses to the BGP sessions. >>> >>> Cheers >>> >>> >>> >>> On 19/2/2023 6:44 am, Nico Schottelius wrote: >>>> Dear group, >>>> >>>> I was wondering how wireguard [Linux kernel] or wireguard-go [FreeBSD] >>>> are supposed to decide which IP address to use for replying? >>>> >>>> I have seen both on FreeBSD and Linux that wireguard seems to use the IP >>>> address of the outgoing interface, i.e. the one with the route returning >>>> to the sender. However in multi homed situations, this can be wrong, >>>> let's take this example: >>>> >>>> 19:57:24.607526 net1 In IP 194.5.220.43.60770 > 147.78.195.254.51820: UDP, length 148 >>>> 19:57:24.608358 net2 Out IP 195.141.200.73.51820 > 194.5.220.43.60770: UDP, length 92 >>>> >>>> The initiator sends from 194.5.220.43 to the receiver 147.78.195.254. >>>> Wireguard then replies with the source IP of 195.141.200.73 instead of >>>> 147.78.195.254. >>>> >>>> As the node is multi homed, the packet might leave through any of its >>>> uplinks and thus return with a random (unexpected) IP address and will >>>> not pass NAT rules on firewalls and finally be dropped. F.i. in above >>>> example the firewall drops the packet from 195.141.200.73, because there >>>> is no session entry for that. >>>> >>>> I have observed this behaviour both on Linux 6.1.11 as well as >>>> wireguard-go 0.0.20220316_8,1 on FreeBSD and in both cases the >>>> connection will break depending on which active interface is taken as >>>> exit. >>>> >>>> I would argue that wireguard should by default invert the IP >>>> addresses, i.e. switch dst=src, src=dst and then reply with that, >>>> instead of adapting an interface specific address, or is there a good >>>> reason for the current behaviour? >>>> >>>> Best regards, >>>> >>>> Nico >>>> >>>> -- >>>> Sustainable and modern Infrastructures by ungleich.ch > > -- > Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <2ed829aaed9fec59ac2a9b32c4ce0a9005b8d8b850be81c81a226791855fe4eb@mu.id>]
* Re: Source IP incorrect on multi homed systems [not found] ` <2ed829aaed9fec59ac2a9b32c4ce0a9005b8d8b850be81c81a226791855fe4eb@mu.id> @ 2023-02-19 12:13 ` Nico Schottelius 2023-02-19 14:39 ` Christoph Loesch 0 siblings, 1 reply; 34+ messages in thread From: Nico Schottelius @ 2023-02-19 12:13 UTC (permalink / raw) To: Sebastian Hyrwall Cc: Nico Schottelius, Mike O'Connor, WireGuard mailing list Hey Sebastian, Sebastian Hyrwall <sh@keff.org> writes: > It is kinda. It's been mentioned multiple times over the years but no one seems to want to fix it. Atleast you should be able to specify bind/src ip in the > config. I gave up WG because of it. Wasn't accepted by my projects security policy since src ip could not be configured. > > There is an unofficial patch however, > > https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281 the binding is somewhat related to this issue and I was looking for that feature some time ago, too. While it is correlated and I would really appreciate binding support, I am not sure whether the linked patch does actually fix the problem I am seeing in multi homed devices. As long as wireguard does not reply with the same IP address it was contacted with, packets will get dropped on stateful firewalls, because the returning packet does not match the state session database. Best regards, Nico -- Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 12:13 ` Nico Schottelius @ 2023-02-19 14:39 ` Christoph Loesch 2023-02-19 16:32 ` David Kerr 2023-02-19 20:02 ` Nico Schottelius 0 siblings, 2 replies; 34+ messages in thread From: Christoph Loesch @ 2023-02-19 14:39 UTC (permalink / raw) To: wireguard Hi, I don't think no one wants to fix it, there are several users having this issue. I rather guess no one could find a suitable solution to fix it. @Nico: did you try to delete the affected route and add it again with the correct source IP ? as I mentioned it in https://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html ip route del <NET> ip route add <NET> dev <ALIAS_DEV> src <SRC_IP> This way I was able to (at least temporary) fix this issue on multi homed systems. Kind regards, Christoph Am 19.02.2023 um 13:13 schrieb Nico Schottelius: > Hey Sebastian, > > Sebastian Hyrwall <sh@keff.org> writes: > >> It is kinda. It's been mentioned multiple times over the years but no one seems to want to fix it. Atleast you should be able to specify bind/src ip in the >> config. I gave up WG because of it. Wasn't accepted by my projects security policy since src ip could not be configured. >> >> There is an unofficial patch however, >> >> https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281 > the binding is somewhat related to this issue and I was looking for that > feature some time ago, too. While it is correlated and I would really > appreciate binding support, I am not sure whether the linked patch does > actually fix the problem I am seeing in multi homed devices. > > As long as wireguard does not reply with the same IP address it was > contacted with, packets will get dropped on stateful firewalls, because > the returning packet does not match the state session database. > > Best regards, > > Nico > > -- > Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 14:39 ` Christoph Loesch @ 2023-02-19 16:32 ` David Kerr 2023-02-19 16:54 ` Sebastian Hyrvall 2023-02-19 17:05 ` tlhackque 2023-02-19 20:02 ` Nico Schottelius 1 sibling, 2 replies; 34+ messages in thread From: David Kerr @ 2023-02-19 16:32 UTC (permalink / raw) To: wireguard Without getting into the debate of whether wireguard is acting correctly or not, I think there is a possible workaround. 1. In the iptables mangle table PREROUTING, match the incoming interface and destination address and --set-xmark a firewall MARK unique to this interface/destination 2. Create a new ip route table that sets the default route to go out on the interface with the source address you want (same as destination address in iptables) 3. Create a new ip rule that sends all packets with firewall mark set in iptables to the routing table you just created Repeat above for each interface/address you need to mangle, with a unique firewall mark and routing table for each. It may be necessary to use CONNMARK in PREROUTING and OUTPUT to --restore_mark. I can't remember if this is needed or not, its been a while since I configured iptables with this. This should ensure that any packet that comes into an interface/address is replied to from the same interface/address. David On Sun, Feb 19, 2023 at 9:44 AM Christoph Loesch <wireguard-mail@chil.at> wrote: > > Hi, > > I don't think no one wants to fix it, there are several users having this issue. I rather guess no one could find a suitable solution to fix it. > > @Nico: did you try to delete the affected route and add it again with the correct source IP ? > > as I mentioned it in https://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html > > ip route del <NET> > ip route add <NET> dev <ALIAS_DEV> src <SRC_IP> > > This way I was able to (at least temporary) fix this issue on multi homed systems. > > Kind regards, > Christoph > > Am 19.02.2023 um 13:13 schrieb Nico Schottelius: > > Hey Sebastian, > > > > Sebastian Hyrwall <sh@keff.org> writes: > > > >> It is kinda. It's been mentioned multiple times over the years but no one seems to want to fix it. Atleast you should be able to specify bind/src ip in the > >> config. I gave up WG because of it. Wasn't accepted by my projects security policy since src ip could not be configured. > >> > >> There is an unofficial patch however, > >> > >> https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281 > > the binding is somewhat related to this issue and I was looking for that > > feature some time ago, too. While it is correlated and I would really > > appreciate binding support, I am not sure whether the linked patch does > > actually fix the problem I am seeing in multi homed devices. > > > > As long as wireguard does not reply with the same IP address it was > > contacted with, packets will get dropped on stateful firewalls, because > > the returning packet does not match the state session database. > > > > Best regards, > > > > Nico > > > > -- > > Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 16:32 ` David Kerr @ 2023-02-19 16:54 ` Sebastian Hyrvall 2023-02-19 18:04 ` Janne Johansson 2023-02-19 17:05 ` tlhackque 1 sibling, 1 reply; 34+ messages in thread From: Sebastian Hyrvall @ 2023-02-19 16:54 UTC (permalink / raw) To: wireguard You should get into that debate. Proposing firewall workarounds is not a correct solution so please don't do it. It needs to be fixed. It's an immature VPN solution that always just proposed a workaround instead of fixing the problem. It seems to be designed by people that are good at software and cryptography but has no clue about networking stacks. On 2023-02-19 23:32, David Kerr wrote: > Without getting into the debate of whether wireguard is acting > correctly or not, I think there is a possible workaround. > > 1. In the iptables mangle table PREROUTING, match the incoming > interface and destination address and --set-xmark a firewall MARK > unique to this interface/destination > 2. Create a new ip route table that sets the default route to go out > on the interface with the source address you want (same as destination > address in iptables) > 3. Create a new ip rule that sends all packets with firewall mark set > in iptables to the routing table you just created > > Repeat above for each interface/address you need to mangle, with a > unique firewall mark and routing table for each. > > It may be necessary to use CONNMARK in PREROUTING and OUTPUT to > --restore_mark. I can't remember if this is needed or not, its been a > while since I configured iptables with this. > > This should ensure that any packet that comes into an > interface/address is replied to from the same interface/address. > > David > > > On Sun, Feb 19, 2023 at 9:44 AM Christoph Loesch <wireguard-mail@chil.at> wrote: >> Hi, >> >> I don't think no one wants to fix it, there are several users having this issue. I rather guess no one could find a suitable solution to fix it. >> >> @Nico: did you try to delete the affected route and add it again with the correct source IP ? >> >> as I mentioned it in https://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html >> >> ip route del <NET> >> ip route add <NET> dev <ALIAS_DEV> src <SRC_IP> >> >> This way I was able to (at least temporary) fix this issue on multi homed systems. >> >> Kind regards, >> Christoph >> >> Am 19.02.2023 um 13:13 schrieb Nico Schottelius: >>> Hey Sebastian, >>> >>> Sebastian Hyrwall <sh@keff.org> writes: >>> >>>> It is kinda. It's been mentioned multiple times over the years but no one seems to want to fix it. Atleast you should be able to specify bind/src ip in the >>>> config. I gave up WG because of it. Wasn't accepted by my projects security policy since src ip could not be configured. >>>> >>>> There is an unofficial patch however, >>>> >>>> https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281 >>> the binding is somewhat related to this issue and I was looking for that >>> feature some time ago, too. While it is correlated and I would really >>> appreciate binding support, I am not sure whether the linked patch does >>> actually fix the problem I am seeing in multi homed devices. >>> >>> As long as wireguard does not reply with the same IP address it was >>> contacted with, packets will get dropped on stateful firewalls, because >>> the returning packet does not match the state session database. >>> >>> Best regards, >>> >>> Nico >>> >>> -- >>> Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 16:54 ` Sebastian Hyrvall @ 2023-02-19 18:04 ` Janne Johansson 2023-02-19 18:08 ` Sebastian Hyrvall 2023-02-19 20:11 ` Nico Schottelius 0 siblings, 2 replies; 34+ messages in thread From: Janne Johansson @ 2023-02-19 18:04 UTC (permalink / raw) To: Sebastian Hyrvall; +Cc: wireguard Den sön 19 feb. 2023 kl 18:06 skrev Sebastian Hyrvall <sh@keff.org>: > > You should get into that debate. Proposing firewall workarounds is not a > correct solution so please don't do it. It needs to be fixed. It's an > immature VPN solution that always just proposed a workaround instead of > fixing the problem. I would make sure that you are not mis-ascribing the problem* to "an immature VPN" and not what the default UDP behaviour of the kernel is, to pick a working interface to send packets from based on the routing table, in which any/all udp based tunnel would suffer the same problem. If you google it, you may find that other udp transports face the same "problem". *) https://en.wiktionary.org/wiki/Chesterton%27s_fence -- May the most significant bit of your life be positive. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 18:04 ` Janne Johansson @ 2023-02-19 18:08 ` Sebastian Hyrvall 2023-02-19 20:11 ` Nico Schottelius 1 sibling, 0 replies; 34+ messages in thread From: Sebastian Hyrvall @ 2023-02-19 18:08 UTC (permalink / raw) To: Janne Johansson; +Cc: wireguard It is the default behavior of the kernel. But all networking software dealing in security knows how to correctly behave. You are welcome to inform me of something else suffering the same problem. On 2023-02-20 01:04, Janne Johansson wrote: > Den sön 19 feb. 2023 kl 18:06 skrev Sebastian Hyrvall <sh@keff.org>: >> You should get into that debate. Proposing firewall workarounds is not a >> correct solution so please don't do it. It needs to be fixed. It's an >> immature VPN solution that always just proposed a workaround instead of >> fixing the problem. > I would make sure that you are not mis-ascribing the problem* to "an > immature VPN" and not what the default UDP behaviour of the kernel is, > to pick a working interface to send packets from based on the routing > table, in which any/all udp based tunnel would suffer the same > problem. If you google it, you may find that other udp transports face > the same "problem". > > *) https://en.wiktionary.org/wiki/Chesterton%27s_fence > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 18:04 ` Janne Johansson 2023-02-19 18:08 ` Sebastian Hyrvall @ 2023-02-19 20:11 ` Nico Schottelius 1 sibling, 0 replies; 34+ messages in thread From: Nico Schottelius @ 2023-02-19 20:11 UTC (permalink / raw) To: Janne Johansson; +Cc: Sebastian Hyrvall, wireguard Hey Janne, Janne Johansson <icepic.dz@gmail.com> writes: > *) https://en.wiktionary.org/wiki/Chesterton%27s_fence I am happy to have learned a new principle today, thanks for that. And to be sure that everyone is on the same page: Wireguard should reply by default with the source address that used to be the destination address, but at the moment wireguard is not doing that at the moment. If anyone disagrees with above statement, please let me know. -- Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 16:32 ` David Kerr 2023-02-19 16:54 ` Sebastian Hyrvall @ 2023-02-19 17:05 ` tlhackque [not found] ` <CADGd2DoE6TCtCxxWL7JWyNW5+yy_Pe+9MNzHznbudMWLTXQreA@mail.gmail.com> ` (2 more replies) 1 sibling, 3 replies; 34+ messages in thread From: tlhackque @ 2023-02-19 17:05 UTC (permalink / raw) To: wireguard [-- Attachment #1.1: Type: text/plain, Size: 3840 bytes --] FWIW, while clever, I don't think that iptables mark solves all cases. E.g., consider an interface with multiple addresses, where a packet comes in on a secondary address. The proposed rule would send it out the right interface, but still with the wrong (primary) address picked from the interface... With IPv6 it's common to assign an address to a service rather than a host so services can move easily. So multiple addresses per interface are the rule, not the exception. I do the same with IPv4 inside addresses, though these days public IPv4 addresses are scarce enough that it's not common for public IPs. It amounts to the same issue - the NAT tracking is stateful. Trying to work around this with routing seems like a maze of twisty passages - so I agree that the right solution is for WG to respond from the address that receives a packet. On 19-Feb-23 11:32, David Kerr wrote: > Without getting into the debate of whether wireguard is acting > correctly or not, I think there is a possible workaround. > > 1. In the iptables mangle table PREROUTING, match the incoming > interface and destination address and --set-xmark a firewall MARK > unique to this interface/destination > 2. Create a new ip route table that sets the default route to go out > on the interface with the source address you want (same as destination > address in iptables) > 3. Create a new ip rule that sends all packets with firewall mark set > in iptables to the routing table you just created > > Repeat above for each interface/address you need to mangle, with a > unique firewall mark and routing table for each. > > It may be necessary to use CONNMARK in PREROUTING and OUTPUT to > --restore_mark. I can't remember if this is needed or not, its been a > while since I configured iptables with this. > > This should ensure that any packet that comes into an > interface/address is replied to from the same interface/address. > > David > > > On Sun, Feb 19, 2023 at 9:44 AM Christoph Loesch<wireguard-mail@chil.at> wrote: >> Hi, >> >> I don't think no one wants to fix it, there are several users having this issue. I rather guess no one could find a suitable solution to fix it. >> >> @Nico: did you try to delete the affected route and add it again with the correct source IP ? >> >> as I mentioned it inhttps://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html >> >> ip route del <NET> >> ip route add <NET> dev <ALIAS_DEV> src <SRC_IP> >> >> This way I was able to (at least temporary) fix this issue on multi homed systems. >> >> Kind regards, >> Christoph >> >> Am 19.02.2023 um 13:13 schrieb Nico Schottelius: >>> Hey Sebastian, >>> >>> Sebastian Hyrwall<sh@keff.org> writes: >>> >>>> It is kinda. It's been mentioned multiple times over the years but no one seems to want to fix it. Atleast you should be able to specify bind/src ip in the >>>> config. I gave up WG because of it. Wasn't accepted by my projects security policy since src ip could not be configured. >>>> >>>> There is an unofficial patch however, >>>> >>>> https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281 >>> the binding is somewhat related to this issue and I was looking for that >>> feature some time ago, too. While it is correlated and I would really >>> appreciate binding support, I am not sure whether the linked patch does >>> actually fix the problem I am seeing in multi homed devices. >>> >>> As long as wireguard does not reply with the same IP address it was >>> contacted with, packets will get dropped on stateful firewalls, because >>> the returning packet does not match the state session database. >>> >>> Best regards, >>> >>> Nico >>> >>> -- >>> Sustainable and modern Infrastructures by ungleich.ch [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 840 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <CADGd2DoE6TCtCxxWL7JWyNW5+yy_Pe+9MNzHznbudMWLTXQreA@mail.gmail.com>]
* Fwd: Source IP incorrect on multi homed systems [not found] ` <CADGd2DoE6TCtCxxWL7JWyNW5+yy_Pe+9MNzHznbudMWLTXQreA@mail.gmail.com> @ 2023-02-19 18:30 ` John Lauro 2023-02-19 22:28 ` tlhackque 1 sibling, 0 replies; 34+ messages in thread From: John Lauro @ 2023-02-19 18:30 UTC (permalink / raw) To: WireGuard mailing list I think the ip route with src would work, but only as a short lived work around. The problem with it is if dealing with dynamic routes is it could go a way when a link is down and then come back and the src setting would be lost. You would need the bgp software to add the src. UDP is connectionless. Sending back out the same as it's coming in isn't strictly the same. The streams are not attached the same as they would be with TCP on nginx or a reply with icmp. You should be able to whitelist the udp port on the NAT devices, as it shouldn't use state info. I am not sure if you are attempting to do site to site or client to server/site and which end has the NAT (or both). What I do for site to site is use a different port for each connection and have a separate BGP connection for each possible connection (ie: different one for different network providers). Have a full mesh with 8 sites and upto 3 providers per site. That said, you probably have floating IPs on the client side, and don't want to lock in a single IP on the multi-homed server side? You could nat the incoming IPs on the border from an internal IP and then then lock to a single private IP on the wireguard server for in/out and that border nat would force the reply back to the same gateway it came in from. I know, you don't want work arounds, just want to mention it's not the same as comparing a single stream to something that handles routing though it. As you are doing bgp and redundant routes I assume you also reset rp_filter on all nat/wireguard/routers so the routers will allow packets to come from different sources. On Sun, Feb 19, 2023 at 12:07 PM tlhackque <tlhackque@yahoo.com> wrote: > > FWIW, while clever, I don't think that iptables mark solves all cases. > E.g., consider an interface with multiple addresses, where a packet > comes in on a secondary address. The proposed rule would send it out > the right interface, but still with the wrong (primary) address picked > from the interface... > > With IPv6 it's common to assign an address to a service rather than a > host so services can move easily. So multiple addresses per interface > are the rule, not the exception. > > I do the same with IPv4 inside addresses, though these days public IPv4 > addresses are scarce enough that it's not common for public IPs. It > amounts to the same issue - the NAT tracking is stateful. > > Trying to work around this with routing seems like a maze of twisty > passages - so I agree that the right solution is for WG to respond from > the address that receives a packet. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems [not found] ` <CADGd2DoE6TCtCxxWL7JWyNW5+yy_Pe+9MNzHznbudMWLTXQreA@mail.gmail.com> 2023-02-19 18:30 ` Fwd: " John Lauro @ 2023-02-19 22:28 ` tlhackque 2023-02-20 0:58 ` Luiz Angelo Daros de Luca 1 sibling, 1 reply; 34+ messages in thread From: tlhackque @ 2023-02-19 22:28 UTC (permalink / raw) To: John Lauro; +Cc: wireguard [-- Attachment #1.1: Type: text/plain, Size: 7599 bytes --] Actually in my case (I'm not the originator of this thread), I don't run BGP. But I do have both site-site and mobile-site clients. Much simpler environment, but same issue. I do understand UDP. As I've noted, DNS UDP has the same issue, and an RFC was issued to clarify that responses MUST come from the address on which a query is received. WG isn't quite the same, as it isn't a request/response protocol. But it is a flow between two endpoints, and NAT/firewalls will open a pinhole for incoming packets when they see an outbound packet. One of the nice things about WG is that except for this issue, it has no dependencies on custom routing (or anything but UDP) and "just works". It should "just work" on multihomed hosts, without handstands, BGP routing, different ports, and the like. It also needs to work where it's not feasible to layer on work-arounds, such as VPSs where you don't get to pick your kernel...or your firewall. Picking stable endpoint addresses would make the traffic look like the kind of flow that these middleboxes recognize, and things would "just work". On 19-Feb-23 13:25, John Lauro wrote: > I think the ip route with src would work, but only as a short lived > work around. The problem with that is if dealing with dynamic routes > is it could go a way when a link is down and then come back and the > src setting would be lost. You would need the bgp software to add the > src. > > UDP is connectionless. Sending back out the same as it's coming in > isn't strictly the same. The streams are not attached the same as > they would be with TCP on nginx or a reply with icmp. You should be > able to whitelist the udp port on the NAT devices, as it shouldn't use > state info. > > I am not sure if you are attempting to do site to site or client to > server/site and which end has the NAT (or both). What I do for site to > site is use a different port for each connection and have a separate > BGP connection for each possible connection (ie: different one for > different network providers). Have a full mesh with 8 sites and upto > 3 providers per site. > > That said, you probably have floating IPs on the client side, and > don't want to lock in a single IP on the multi-homed server side? You > could nat the incoming IPs on the border from an internal IP and then > then lock to a single private IP on the wireguard server for in/out > and that border nat would force the reply back to the same gateway it > came in from. > > I know, you don't want work arounds, just want to mention it's not the > same as comparing a single stream to something that handles routing > though it. As you are doing bgp and redundant routes I assume you > also reset rp_filter on all nat/wireguard/routers so the routers will > allow packets to come from different sources. > > On Sun, Feb 19, 2023 at 12:07 PM tlhackque <tlhackque@yahoo.com> wrote: > > FWIW, while clever, I don't think that iptables mark solves all > cases. > E.g., consider an interface with multiple addresses, where a packet > comes in on a secondary address. The proposed rule would send it out > the right interface, but still with the wrong (primary) address > picked > from the interface... > > With IPv6 it's common to assign an address to a service rather than a > host so services can move easily. So multiple addresses per > interface > are the rule, not the exception. > > I do the same with IPv4 inside addresses, though these days public > IPv4 > addresses are scarce enough that it's not common for public IPs. It > amounts to the same issue - the NAT tracking is stateful. > > Trying to work around this with routing seems like a maze of twisty > passages - so I agree that the right solution is for WG to respond > from > the address that receives a packet. > > On 19-Feb-23 11:32, David Kerr wrote: > > Without getting into the debate of whether wireguard is acting > > correctly or not, I think there is a possible workaround. > > > > 1. In the iptables mangle table PREROUTING, match the incoming > > interface and destination address and --set-xmark a firewall MARK > > unique to this interface/destination > > 2. Create a new ip route table that sets the default route to go out > > on the interface with the source address you want (same as > destination > > address in iptables) > > 3. Create a new ip rule that sends all packets with firewall > mark set > > in iptables to the routing table you just created > > > > Repeat above for each interface/address you need to mangle, with a > > unique firewall mark and routing table for each. > > > > It may be necessary to use CONNMARK in PREROUTING and OUTPUT to > > --restore_mark. I can't remember if this is needed or not, its > been a > > while since I configured iptables with this. > > > > This should ensure that any packet that comes into an > > interface/address is replied to from the same interface/address. > > > > David > > > > > > On Sun, Feb 19, 2023 at 9:44 AM Christoph > Loesch<wireguard-mail@chil.at> wrote: > >> Hi, > >> > >> I don't think no one wants to fix it, there are several users > having this issue. I rather guess no one could find a suitable > solution to fix it. > >> > >> @Nico: did you try to delete the affected route and add it > again with the correct source IP ? > >> > >> as I mentioned it > inhttps://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html > <http://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html> > >> > >> ip route del <NET> > >> ip route add <NET> dev <ALIAS_DEV> src <SRC_IP> > >> > >> This way I was able to (at least temporary) fix this issue on > multi homed systems. > >> > >> Kind regards, > >> Christoph > >> > >> Am 19.02.2023 um 13:13 schrieb Nico Schottelius: > >>> Hey Sebastian, > >>> > >>> Sebastian Hyrwall<sh@keff.org> writes: > >>> > >>>> It is kinda. It's been mentioned multiple times over the > years but no one seems to want to fix it. Atleast you should be > able to specify bind/src ip in the > >>>> config. I gave up WG because of it. Wasn't accepted by my > projects security policy since src ip could not be configured. > >>>> > >>>> There is an unofficial patch however, > >>>> > >>>> > https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281 > >>> the binding is somewhat related to this issue and I was > looking for that > >>> feature some time ago, too. While it is correlated and I would > really > >>> appreciate binding support, I am not sure whether the linked > patch does > >>> actually fix the problem I am seeing in multi homed devices. > >>> > >>> As long as wireguard does not reply with the same IP address > it was > >>> contacted with, packets will get dropped on stateful > firewalls, because > >>> the returning packet does not match the state session database. > >>> > >>> Best regards, > >>> > >>> Nico > >>> > >>> -- > >>> Sustainable and modern Infrastructures by ungleich.ch > <http://ungleich.ch> > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 840 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 22:28 ` tlhackque @ 2023-02-20 0:58 ` Luiz Angelo Daros de Luca 0 siblings, 0 replies; 34+ messages in thread From: Luiz Angelo Daros de Luca @ 2023-02-20 0:58 UTC (permalink / raw) To: tlhackque; +Cc: John Lauro, wireguard Yes, wg is not a request/response protocol. But it does have some state. Can't wireguard remember the last local address that each peer sent traffic? It is just like the tracking already in use for peer ip address. If there is an "last address" it would be nice if we could hint the kernel to use that as the source address, with a fallback to the current behavior if the address is not available. It might solve a couple of problems. I just don't know if it is possible to hint the source address without enforcing it. It not, wg would have to deal with cases when the address is gone. Regards, Luiz ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 17:05 ` tlhackque [not found] ` <CADGd2DoE6TCtCxxWL7JWyNW5+yy_Pe+9MNzHznbudMWLTXQreA@mail.gmail.com> @ 2023-02-19 18:37 ` David Kerr 2023-02-19 18:52 ` tlhackque 2023-02-19 18:42 ` tlhackque 2 siblings, 1 reply; 34+ messages in thread From: David Kerr @ 2023-02-19 18:37 UTC (permalink / raw) To: wireguard My proposed workaround specifically stated to match on both the interface and destination address, and to set a route with both interface and [source] address. This allows for multiple IP addresses on the same interface -- which you can do with both IPv4 and IPv6. But yes, it is a nasty hack. You really need to understand what is going on between the firewall and routing tables/rules and it is easy to get confused. On Sun, Feb 19, 2023 at 12:10 PM tlhackque <tlhackque@yahoo.com> wrote: > > FWIW, while clever, I don't think that iptables mark solves all cases. > E.g., consider an interface with multiple addresses, where a packet > comes in on a secondary address. The proposed rule would send it out > the right interface, but still with the wrong (primary) address picked > from the interface... > > With IPv6 it's common to assign an address to a service rather than a > host so services can move easily. So multiple addresses per interface > are the rule, not the exception. > > I do the same with IPv4 inside addresses, though these days public IPv4 > addresses are scarce enough that it's not common for public IPs. It > amounts to the same issue - the NAT tracking is stateful. > > Trying to work around this with routing seems like a maze of twisty > passages - so I agree that the right solution is for WG to respond from > the address that receives a packet. > > On 19-Feb-23 11:32, David Kerr wrote: > > Without getting into the debate of whether wireguard is acting > > correctly or not, I think there is a possible workaround. > > > > 1. In the iptables mangle table PREROUTING, match the incoming > > interface and destination address and --set-xmark a firewall MARK > > unique to this interface/destination > > 2. Create a new ip route table that sets the default route to go out > > on the interface with the source address you want (same as destination > > address in iptables) > > 3. Create a new ip rule that sends all packets with firewall mark set > > in iptables to the routing table you just created > > > > Repeat above for each interface/address you need to mangle, with a > > unique firewall mark and routing table for each. > > > > It may be necessary to use CONNMARK in PREROUTING and OUTPUT to > > --restore_mark. I can't remember if this is needed or not, its been a > > while since I configured iptables with this. > > > > This should ensure that any packet that comes into an > > interface/address is replied to from the same interface/address. > > > > David > > > > > > On Sun, Feb 19, 2023 at 9:44 AM Christoph Loesch<wireguard-mail@chil.at> wrote: > >> Hi, > >> > >> I don't think no one wants to fix it, there are several users having this issue. I rather guess no one could find a suitable solution to fix it. > >> > >> @Nico: did you try to delete the affected route and add it again with the correct source IP ? > >> > >> as I mentioned it inhttps://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html > >> > >> ip route del <NET> > >> ip route add <NET> dev <ALIAS_DEV> src <SRC_IP> > >> > >> This way I was able to (at least temporary) fix this issue on multi homed systems. > >> > >> Kind regards, > >> Christoph > >> > >> Am 19.02.2023 um 13:13 schrieb Nico Schottelius: > >>> Hey Sebastian, > >>> > >>> Sebastian Hyrwall<sh@keff.org> writes: > >>> > >>>> It is kinda. It's been mentioned multiple times over the years but no one seems to want to fix it. Atleast you should be able to specify bind/src ip in the > >>>> config. I gave up WG because of it. Wasn't accepted by my projects security policy since src ip could not be configured. > >>>> > >>>> There is an unofficial patch however, > >>>> > >>>> https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281 > >>> the binding is somewhat related to this issue and I was looking for that > >>> feature some time ago, too. While it is correlated and I would really > >>> appreciate binding support, I am not sure whether the linked patch does > >>> actually fix the problem I am seeing in multi homed devices. > >>> > >>> As long as wireguard does not reply with the same IP address it was > >>> contacted with, packets will get dropped on stateful firewalls, because > >>> the returning packet does not match the state session database. > >>> > >>> Best regards, > >>> > >>> Nico > >>> > >>> -- > >>> Sustainable and modern Infrastructures by ungleich.ch > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 18:37 ` David Kerr @ 2023-02-19 18:52 ` tlhackque 0 siblings, 0 replies; 34+ messages in thread From: tlhackque @ 2023-02-19 18:52 UTC (permalink / raw) To: wireguard [-- Attachment #1.1: Type: text/plain, Size: 5226 bytes --] On 19-Feb-23 13:37, David Kerr wrote: > My proposed workaround specifically stated to match on both the > interface and destination address, and to set a route with both > interface and [source] address. This allows for multiple IP addresses > on the same interface -- which you can do with both IPv4 and IPv6. Fair enough. Of course, that means having a unique rule and mark for each if/destination address, which you now have to manage - and avoid conflicts with all other uses of mark. One of which is wg-quick... "manage" includes remembering to add/remove the rule and allocate/deallocate the mark synchronously with wg-enabled IP addresses - and if wg is listening on all addresses, that means every ip address. You can get there, but as I said, it's a maze of twisty passages and the complications of managing it pile up. > But yes, it is a nasty hack. You really need to understand what is > going on between the firewall and routing tables/rules and it is easy > to get confused. > > > On Sun, Feb 19, 2023 at 12:10 PM tlhackque<tlhackque@yahoo.com> wrote: >> FWIW, while clever, I don't think that iptables mark solves all cases. >> E.g., consider an interface with multiple addresses, where a packet >> comes in on a secondary address. The proposed rule would send it out >> the right interface, but still with the wrong (primary) address picked >> from the interface... >> >> With IPv6 it's common to assign an address to a service rather than a >> host so services can move easily. So multiple addresses per interface >> are the rule, not the exception. >> >> I do the same with IPv4 inside addresses, though these days public IPv4 >> addresses are scarce enough that it's not common for public IPs. It >> amounts to the same issue - the NAT tracking is stateful. >> >> Trying to work around this with routing seems like a maze of twisty >> passages - so I agree that the right solution is for WG to respond from >> the address that receives a packet. >> >> On 19-Feb-23 11:32, David Kerr wrote: >>> Without getting into the debate of whether wireguard is acting >>> correctly or not, I think there is a possible workaround. >>> >>> 1. In the iptables mangle table PREROUTING, match the incoming >>> interface and destination address and --set-xmark a firewall MARK >>> unique to this interface/destination >>> 2. Create a new ip route table that sets the default route to go out >>> on the interface with the source address you want (same as destination >>> address in iptables) >>> 3. Create a new ip rule that sends all packets with firewall mark set >>> in iptables to the routing table you just created >>> >>> Repeat above for each interface/address you need to mangle, with a >>> unique firewall mark and routing table for each. >>> >>> It may be necessary to use CONNMARK in PREROUTING and OUTPUT to >>> --restore_mark. I can't remember if this is needed or not, its been a >>> while since I configured iptables with this. >>> >>> This should ensure that any packet that comes into an >>> interface/address is replied to from the same interface/address. >>> >>> David >>> >>> >>> On Sun, Feb 19, 2023 at 9:44 AM Christoph Loesch<wireguard-mail@chil.at> wrote: >>>> Hi, >>>> >>>> I don't think no one wants to fix it, there are several users having this issue. I rather guess no one could find a suitable solution to fix it. >>>> >>>> @Nico: did you try to delete the affected route and add it again with the correct source IP ? >>>> >>>> as I mentioned it inhttps://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html >>>> >>>> ip route del <NET> >>>> ip route add <NET> dev <ALIAS_DEV> src <SRC_IP> >>>> >>>> This way I was able to (at least temporary) fix this issue on multi homed systems. >>>> >>>> Kind regards, >>>> Christoph >>>> >>>> Am 19.02.2023 um 13:13 schrieb Nico Schottelius: >>>>> Hey Sebastian, >>>>> >>>>> Sebastian Hyrwall<sh@keff.org> writes: >>>>> >>>>>> It is kinda. It's been mentioned multiple times over the years but no one seems to want to fix it. Atleast you should be able to specify bind/src ip in the >>>>>> config. I gave up WG because of it. Wasn't accepted by my projects security policy since src ip could not be configured. >>>>>> >>>>>> There is an unofficial patch however, >>>>>> >>>>>> https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281 >>>>> the binding is somewhat related to this issue and I was looking for that >>>>> feature some time ago, too. While it is correlated and I would really >>>>> appreciate binding support, I am not sure whether the linked patch does >>>>> actually fix the problem I am seeing in multi homed devices. >>>>> >>>>> As long as wireguard does not reply with the same IP address it was >>>>> contacted with, packets will get dropped on stateful firewalls, because >>>>> the returning packet does not match the state session database. >>>>> >>>>> Best regards, >>>>> >>>>> Nico >>>>> >>>>> -- >>>>> Sustainable and modern Infrastructures by ungleich.ch -- This communication may not represent my employer's views, if any, on the matters discussed. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 840 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 17:05 ` tlhackque [not found] ` <CADGd2DoE6TCtCxxWL7JWyNW5+yy_Pe+9MNzHznbudMWLTXQreA@mail.gmail.com> 2023-02-19 18:37 ` David Kerr @ 2023-02-19 18:42 ` tlhackque 2023-02-19 20:18 ` Nico Schottelius 2 siblings, 1 reply; 34+ messages in thread From: tlhackque @ 2023-02-19 18:42 UTC (permalink / raw) To: wireguard [-- Attachment #1.1: Type: text/plain, Size: 6474 bytes --] BTW, DNS is a common UDP (well, mostly) protocol that encountered the same issue. See RFC 2181 <https://www.rfc-editor.org/rfc/rfc2181.html> (1997), where you'll find (emphasis added): > 4 <https://www.rfc-editor.org/rfc/rfc2181.html#section-4>. Server > Reply Source Address Selection > > Most, if not all, DNS clients, expect the address from which a reply > is received to be the same address as that to which the query > eliciting the reply was sent. This is true for servers acting as > clients for the purposes of recursive query resolution, as well as > simple resolver clients. The address, along with the identifier (ID) > in the reply is used for disambiguating replies, and filtering > spurious responses. This may, or may not, have been intended when > the DNS was designed, but is now a fact of life. > > Some multi-homed hosts running DNS servers generate a reply using a > source address that is not the same as the destination address from > the client's request packet. > _**Such replies will be discarded by the client because the source > address of the reply does not match that of a host to which the client > sent the original request.** _ That is, it > appears to be an unsolicited response. > > 4.1 <https://www.rfc-editor.org/rfc/rfc2181.html#section-4.1>. UDP > Source Address Selection > > ***To avoid these problems, servers when responding to queries using > UDP _must _cause the reply to be sent with the source address field in > the IP header set to the address that was in the destination address > field of the IP header of the packet containing the query causing the > response.** * > If this would cause the response to be sent from an IP > address that is not permitted for this purpose, then the response may > be sent from any legal IP address allocated to the server. That > address should be chosen to maximise the possibility that the client > will be able to use it for further queries. Servers configured in > such a way that not all their addresses are equally reachable from > all potential clients need take particular care when responding to > queries sent to anycast, multicast, or similar, addresses. > On 19-Feb-23 12:05, tlhackque wrote: > FWIW, while clever, I don't think that iptables mark solves all cases. > E.g., consider an interface with multiple addresses, where a packet > comes in on a secondary address. The proposed rule would send it out > the right interface, but still with the wrong (primary) address picked > from the interface... > > With IPv6 it's common to assign an address to a service rather than a > host so services can move easily. So multiple addresses per interface > are the rule, not the exception. > > I do the same with IPv4 inside addresses, though these days public > IPv4 addresses are scarce enough that it's not common for public IPs. > It amounts to the same issue - the NAT tracking is stateful. > > Trying to work around this with routing seems like a maze of twisty > passages - so I agree that the right solution is for WG to respond > from the address that receives a packet. > > On 19-Feb-23 11:32, David Kerr wrote: >> Without getting into the debate of whether wireguard is acting >> correctly or not, I think there is a possible workaround. >> >> 1. In the iptables mangle table PREROUTING, match the incoming >> interface and destination address and --set-xmark a firewall MARK >> unique to this interface/destination >> 2. Create a new ip route table that sets the default route to go out >> on the interface with the source address you want (same as destination >> address in iptables) >> 3. Create a new ip rule that sends all packets with firewall mark set >> in iptables to the routing table you just created >> >> Repeat above for each interface/address you need to mangle, with a >> unique firewall mark and routing table for each. >> >> It may be necessary to use CONNMARK in PREROUTING and OUTPUT to >> --restore_mark. I can't remember if this is needed or not, its been a >> while since I configured iptables with this. >> >> This should ensure that any packet that comes into an >> interface/address is replied to from the same interface/address. >> >> David >> >> >> On Sun, Feb 19, 2023 at 9:44 AM Christoph >> Loesch<wireguard-mail@chil.at> wrote: >>> Hi, >>> >>> I don't think no one wants to fix it, there are several users having >>> this issue. I rather guess no one could find a suitable solution to >>> fix it. >>> >>> @Nico: did you try to delete the affected route and add it again >>> with the correct source IP ? >>> >>> as I mentioned it >>> inhttps://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html >>> >>> ip route del <NET> >>> ip route add <NET> dev <ALIAS_DEV> src <SRC_IP> >>> >>> This way I was able to (at least temporary) fix this issue on multi >>> homed systems. >>> >>> Kind regards, >>> Christoph >>> >>> Am 19.02.2023 um 13:13 schrieb Nico Schottelius: >>>> Hey Sebastian, >>>> >>>> Sebastian Hyrwall<sh@keff.org> writes: >>>> >>>>> It is kinda. It's been mentioned multiple times over the years but >>>>> no one seems to want to fix it. Atleast you should be able to >>>>> specify bind/src ip in the >>>>> config. I gave up WG because of it. Wasn't accepted by my projects >>>>> security policy since src ip could not be configured. >>>>> >>>>> There is an unofficial patch however, >>>>> >>>>> https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281 >>>>> >>>> the binding is somewhat related to this issue and I was looking for >>>> that >>>> feature some time ago, too. While it is correlated and I would really >>>> appreciate binding support, I am not sure whether the linked patch >>>> does >>>> actually fix the problem I am seeing in multi homed devices. >>>> >>>> As long as wireguard does not reply with the same IP address it was >>>> contacted with, packets will get dropped on stateful firewalls, >>>> because >>>> the returning packet does not match the state session database. >>>> >>>> Best regards, >>>> >>>> Nico >>>> >>>> -- >>>> Sustainable and modern Infrastructures by ungleich.ch > -- This communication may not represent my employer's views, if any, on the matters discussed. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 840 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 18:42 ` tlhackque @ 2023-02-19 20:18 ` Nico Schottelius 2023-02-19 20:42 ` Roman Mamedov 0 siblings, 1 reply; 34+ messages in thread From: Nico Schottelius @ 2023-02-19 20:18 UTC (permalink / raw) To: tlhackque; +Cc: wireguard tlhackque <tlhackque@yahoo.com> writes: >> [...] >> 4.1 <https://www.rfc-editor.org/rfc/rfc2181.html#section-4.1>. UDP >> Source Address Selection >> >> ***To avoid these problems, servers when responding to queries >> using UDP _must _cause the reply to be sent with the source address >> field in the IP header set to the address that was in the >> destination address field of the IP header of the packet containing >> the query causing the response.** * OMG, we really have seen everything already, haven't we? Jason, what do you think about adopting the RFC2181 Source Address Selection algorithm for wireguard? If I am not mistaken that would mean in practice: if orignal_pkg.ip_dst == one_of_my_ips then return_pkg.ip.src = orignal_pkg.ip_dst return_pkg.ip.dst = orignal_pkg.ip_src fi For me that sounds like a sane approach (aside from my very simplified algorithm). Best regards, Nico -- Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 20:18 ` Nico Schottelius @ 2023-02-19 20:42 ` Roman Mamedov 2023-02-19 21:19 ` Nico Schottelius 2023-02-19 21:39 ` Source IP incorrect on multi homed systems tlhackque 0 siblings, 2 replies; 34+ messages in thread From: Roman Mamedov @ 2023-02-19 20:42 UTC (permalink / raw) To: Nico Schottelius; +Cc: tlhackque, wireguard On Sun, 19 Feb 2023 21:18:34 +0100 Nico Schottelius <nico.schottelius@ungleich.ch> wrote: > If I am not mistaken that would mean in practice: > > if orignal_pkg.ip_dst == one_of_my_ips then > return_pkg.ip.src = orignal_pkg.ip_dst > return_pkg.ip.dst = orignal_pkg.ip_src > fi > > For me that sounds like a sane approach (aside from > my very simplified algorithm). Except there is no request and response in WG, and as such no original or return packet. Another peer contacts you, then some time later you contact the other peer. Or the other way round. WG-wise what will need to be done is to store in the each peer's information structure the local IP that we are supposed to use for communication with that peer; and updating it when receiving packets from the peer, using the destination of those. So you would see a "Local IP" in each "peer" section when doing a "wg show". Also, until there is such IP initially stored, it will have to be some default outgoing IP of the system towards that peer. BTW, how would this work in your setup, what if not the peer contacts you first, but your machine needs to contact the peer? -- With respect, Roman ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 20:42 ` Roman Mamedov @ 2023-02-19 21:19 ` Nico Schottelius 2023-02-19 22:06 ` tlhackque 2023-02-19 22:42 ` Src addr code review (Was: Source IP incorrect on multi homed systems) Daniel Gröber 2023-02-19 21:39 ` Source IP incorrect on multi homed systems tlhackque 1 sibling, 2 replies; 34+ messages in thread From: Nico Schottelius @ 2023-02-19 21:19 UTC (permalink / raw) To: Roman Mamedov; +Cc: Nico Schottelius, tlhackque, wireguard Hey Roman, Roman Mamedov <rm@romanrm.net> writes: > On Sun, 19 Feb 2023 21:18:34 +0100 > Nico Schottelius <nico.schottelius@ungleich.ch> wrote: > >> If I am not mistaken that would mean in practice: >> >> if orignal_pkg.ip_dst == one_of_my_ips then >> return_pkg.ip.src = orignal_pkg.ip_dst >> return_pkg.ip.dst = orignal_pkg.ip_src >> fi >> >> For me that sounds like a sane approach (aside from >> my very simplified algorithm). > > Except there is no request and response in WG, and as such no original or > return packet. Another peer contacts you, then some time later you contact the > other peer. Or the other way round. > > WG-wise what will need to be done is to store in the each peer's information > structure the local IP that we are supposed to use for communication with that > peer; and updating it when receiving packets from the peer, using the > destination of those. So you would see a "Local IP" in each "peer" section > when doing a "wg show". That is very interesting, thanks for the insight. Reading above paragraph, I was having a very similar thought that we need to record the local IP. > Also, until there is such IP initially stored, it will have to be some default > outgoing IP of the system towards that peer. BTW, how would this work in your > setup, what if not the peer contacts you first, but your machine needs to > contact the peer? So far this situation doesn't exist for us, because only servers are multi homed. However, having an option to specify something a local address in each peer section would probably be a good solution to disambiguate it and if not specified, use the default, as in whatever other processes are using that don't define it explicitly - i.e. follow the process of least surprise. Best regards, Nico -- Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 21:19 ` Nico Schottelius @ 2023-02-19 22:06 ` tlhackque 2023-02-19 22:42 ` Src addr code review (Was: Source IP incorrect on multi homed systems) Daniel Gröber 1 sibling, 0 replies; 34+ messages in thread From: tlhackque @ 2023-02-19 22:06 UTC (permalink / raw) To: WireGuard Mailing list [-- Attachment #1.1: Type: text/plain, Size: 691 bytes --] On 19-Feb-23 16:19, Nico Schottelius wrote: > So far this situation doesn't exist for us, because only servers are > multi homed. It's not that uncommon; consider a docked notebook that has a WiFi address and an Ethernet address on the same subnet. While typically the routing priorities favor the Ethernet, the mobile will have both addresses. In a car, you can have WiFi thru the car and mobile data. (Not saying I like this, but ..) There are probably other cases, but I wouldn't assume it's only a server issue. As I also noted in another note: two servers can have the same issue, if both are multi-homed. The solution really needs to be symmetric. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 840 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Src addr code review (Was: Source IP incorrect on multi homed systems) 2023-02-19 21:19 ` Nico Schottelius 2023-02-19 22:06 ` tlhackque @ 2023-02-19 22:42 ` Daniel Gröber 2023-02-20 0:28 ` 曹煜 2023-02-20 9:47 ` Nico Schottelius 1 sibling, 2 replies; 34+ messages in thread From: Daniel Gröber @ 2023-02-19 22:42 UTC (permalink / raw) To: Nico Schottelius; +Cc: Roman Mamedov, tlhackque, wireguard Hi, I though it might be useful to do some quick and dirty code review instead of speculating wildly to figure out where these source IP selection problems could be coming from ;) From previous code deep dives I know the udp_tunnel_xmit_skb function is where tunnel packets get handed off to the kernel. So in net/wireguard/socket.c:send4 we have: udp_tunnel_xmit_skb(rt, sock, skb, fl.saddr, fl.daddr, ds, ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport, fl.fl4_dport, false, false); Where fl.saddr is the source address that's supposedly wrong (sometimes? I guess?) Where does that come from? Let's look at the code (heavily culled): struct flowi4 fl = { .saddr = endpoint->src4.s_addr, }; if (cache) rt = dst_cache_get_ip4(cache, &fl.saddr); if (!rt) { if (unlikely(!inet_confirm_addr(sock_net(sock), NULL, 0, fl.saddr, RT_SCOPE_HOST))) fl.saddr = 0; if (unlikely(endpoint->src_if4 && ((IS_ERR(rt) && PTR_ERR(rt) == -EINVAL) || (!IS_ERR(rt) && rt->dst.dev->ifindex != endpoint->src_if4)))) fl.saddr = 0; Well it's initialized from endpoint->src4.s_addr, overwritten with zero in some cases, which I believe lets the kernel do it's regular source addr selection, and populated from something called dst_cache at some callsites. @Nico could it perhaps simply be that you're hitting one of these zero'ing cases and that's why it's using regular kernel src addr selection instead of the cached endpoint src4 address? The first case !inet_confirm_addr(..., RT_SCOPE_HOST) ought to confirm that the saddr is actually still a local address. Makes sens if the address we remembered was removed from the interface we can't use it anymore. The second case looks like it's checking if the (sometimes cached) src_if4 interface index is still what the route we're about to use points to. If neither of those seem likely we can keep reading :) --Daniel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Src addr code review (Was: Source IP incorrect on multi homed systems) 2023-02-19 22:42 ` Src addr code review (Was: Source IP incorrect on multi homed systems) Daniel Gröber @ 2023-02-20 0:28 ` 曹煜 2023-02-20 10:40 ` Nico Schottelius 2023-02-20 9:47 ` Nico Schottelius 1 sibling, 1 reply; 34+ messages in thread From: 曹煜 @ 2023-02-20 0:28 UTC (permalink / raw) To: Daniel Gröber; +Cc: Nico Schottelius, Roman Mamedov, tlhackque, wireguard Hi all, I've hacked that source code myself months ago, and it works well on my use case (I have 4 dual stack pppoe wan set on my openwrt router, and seted a wireguard sever on it), my hack will pickup the dst_addr from incoming handshake packet in kernel sk_buff, and then use that addr as src_addr to reply. I'm not good at source code, and I know that my hack may be ugly, but it works, hope this patch can help: https://github.com/openwrt/packages/issues/9538#issuecomment-1150592803 Daniel Gröber <dxld@darkboxed.org> 于2023年2月20日周一 06:42写道: > > Hi, > > I though it might be useful to do some quick and dirty code review instead > of speculating wildly to figure out where these source IP selection > problems could be coming from ;) > > From previous code deep dives I know the udp_tunnel_xmit_skb function is > where tunnel packets get handed off to the kernel. So in > net/wireguard/socket.c:send4 we have: > > udp_tunnel_xmit_skb(rt, sock, skb, fl.saddr, fl.daddr, ds, > ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport, > fl.fl4_dport, false, false); > > Where fl.saddr is the source address that's supposedly wrong (sometimes? I > guess?) Where does that come from? > > Let's look at the code (heavily culled): > > struct flowi4 fl = { > .saddr = endpoint->src4.s_addr, > }; > if (cache) > rt = dst_cache_get_ip4(cache, &fl.saddr); > if (!rt) { > if (unlikely(!inet_confirm_addr(sock_net(sock), NULL, 0, > fl.saddr, RT_SCOPE_HOST))) > fl.saddr = 0; > if (unlikely(endpoint->src_if4 && ((IS_ERR(rt) && > PTR_ERR(rt) == -EINVAL) || (!IS_ERR(rt) && > rt->dst.dev->ifindex != endpoint->src_if4)))) > fl.saddr = 0; > > Well it's initialized from endpoint->src4.s_addr, overwritten with zero in > some cases, which I believe lets the kernel do it's regular source addr > selection, and populated from something called dst_cache at some callsites. > > @Nico could it perhaps simply be that you're hitting one of these zero'ing > cases and that's why it's using regular kernel src addr selection instead > of the cached endpoint src4 address? > > The first case !inet_confirm_addr(..., RT_SCOPE_HOST) ought to confirm that > the saddr is actually still a local address. Makes sens if the address we > remembered was removed from the interface we can't use it anymore. > > The second case looks like it's checking if the (sometimes cached) src_if4 > interface index is still what the route we're about to use points to. > > If neither of those seem likely we can keep reading :) > > --Daniel > > > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Src addr code review (Was: Source IP incorrect on multi homed systems) 2023-02-20 0:28 ` 曹煜 @ 2023-02-20 10:40 ` Nico Schottelius 2023-02-20 11:21 ` 曹煜 0 siblings, 1 reply; 34+ messages in thread From: Nico Schottelius @ 2023-02-20 10:40 UTC (permalink / raw) To: 曹煜 Cc: Daniel Gröber, Nico Schottelius, Roman Mamedov, tlhackque, wireguard Hello 曹煜, on github it seems your patch was applied / the issue was closed - is that the correct current status? Best regards, Nico 曹煜 <cao88yu@gmail.com> writes: > Hi all, > I've hacked that source code myself months ago, and it works well on > my use case (I have 4 dual stack pppoe wan set on my openwrt router, > and seted a wireguard sever on it), my hack will pickup the dst_addr > from incoming handshake packet in kernel sk_buff, and then use that > addr as src_addr to reply. > I'm not good at source code, and I know that my hack may be ugly, but > it works, hope this patch can help: > https://github.com/openwrt/packages/issues/9538#issuecomment-1150592803 > > Daniel Gröber <dxld@darkboxed.org> 于2023年2月20日周一 06:42写道: >> >> Hi, >> >> I though it might be useful to do some quick and dirty code review instead >> of speculating wildly to figure out where these source IP selection >> problems could be coming from ;) >> >> From previous code deep dives I know the udp_tunnel_xmit_skb function is >> where tunnel packets get handed off to the kernel. So in >> net/wireguard/socket.c:send4 we have: >> >> udp_tunnel_xmit_skb(rt, sock, skb, fl.saddr, fl.daddr, ds, >> ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport, >> fl.fl4_dport, false, false); >> >> Where fl.saddr is the source address that's supposedly wrong (sometimes? I >> guess?) Where does that come from? >> >> Let's look at the code (heavily culled): >> >> struct flowi4 fl = { >> .saddr = endpoint->src4.s_addr, >> }; >> if (cache) >> rt = dst_cache_get_ip4(cache, &fl.saddr); >> if (!rt) { >> if (unlikely(!inet_confirm_addr(sock_net(sock), NULL, 0, >> fl.saddr, RT_SCOPE_HOST))) >> fl.saddr = 0; >> if (unlikely(endpoint->src_if4 && ((IS_ERR(rt) && >> PTR_ERR(rt) == -EINVAL) || (!IS_ERR(rt) && >> rt->dst.dev->ifindex != endpoint->src_if4)))) >> fl.saddr = 0; >> >> Well it's initialized from endpoint->src4.s_addr, overwritten with zero in >> some cases, which I believe lets the kernel do it's regular source addr >> selection, and populated from something called dst_cache at some callsites. >> >> @Nico could it perhaps simply be that you're hitting one of these zero'ing >> cases and that's why it's using regular kernel src addr selection instead >> of the cached endpoint src4 address? >> >> The first case !inet_confirm_addr(..., RT_SCOPE_HOST) ought to confirm that >> the saddr is actually still a local address. Makes sens if the address we >> remembered was removed from the interface we can't use it anymore. >> >> The second case looks like it's checking if the (sometimes cached) src_if4 >> interface index is still what the route we're about to use points to. >> >> If neither of those seem likely we can keep reading :) >> >> --Daniel >> >> >> -- Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Src addr code review (Was: Source IP incorrect on multi homed systems) 2023-02-20 10:40 ` Nico Schottelius @ 2023-02-20 11:21 ` 曹煜 0 siblings, 0 replies; 34+ messages in thread From: 曹煜 @ 2023-02-20 11:21 UTC (permalink / raw) To: Nico Schottelius; +Cc: Daniel Gröber, Roman Mamedov, tlhackque, wireguard Hi Nico, That issue was closed by myself, but the patch didn't get applied cause the issue was came from wireguard itself, and the maintener told me that I should send my patch to wireguard upstream (but I just gave up for sending it to wireguard team). Nico Schottelius <nico.schottelius@ungleich.ch> 于2023年2月20日周一 18:41写道: > > > Hello 曹煜, > > on github it seems your patch was applied / the issue was closed - is > that the correct current status? > > Best regards, > > Nico > > 曹煜 <cao88yu@gmail.com> writes: > > > Hi all, > > I've hacked that source code myself months ago, and it works well on > > my use case (I have 4 dual stack pppoe wan set on my openwrt router, > > and seted a wireguard sever on it), my hack will pickup the dst_addr > > from incoming handshake packet in kernel sk_buff, and then use that > > addr as src_addr to reply. > > I'm not good at source code, and I know that my hack may be ugly, but > > it works, hope this patch can help: > > https://github.com/openwrt/packages/issues/9538#issuecomment-1150592803 > > > > Daniel Gröber <dxld@darkboxed.org> 于2023年2月20日周一 06:42写道: > >> > >> Hi, > >> > >> I though it might be useful to do some quick and dirty code review instead > >> of speculating wildly to figure out where these source IP selection > >> problems could be coming from ;) > >> > >> From previous code deep dives I know the udp_tunnel_xmit_skb function is > >> where tunnel packets get handed off to the kernel. So in > >> net/wireguard/socket.c:send4 we have: > >> > >> udp_tunnel_xmit_skb(rt, sock, skb, fl.saddr, fl.daddr, ds, > >> ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport, > >> fl.fl4_dport, false, false); > >> > >> Where fl.saddr is the source address that's supposedly wrong (sometimes? I > >> guess?) Where does that come from? > >> > >> Let's look at the code (heavily culled): > >> > >> struct flowi4 fl = { > >> .saddr = endpoint->src4.s_addr, > >> }; > >> if (cache) > >> rt = dst_cache_get_ip4(cache, &fl.saddr); > >> if (!rt) { > >> if (unlikely(!inet_confirm_addr(sock_net(sock), NULL, 0, > >> fl.saddr, RT_SCOPE_HOST))) > >> fl.saddr = 0; > >> if (unlikely(endpoint->src_if4 && ((IS_ERR(rt) && > >> PTR_ERR(rt) == -EINVAL) || (!IS_ERR(rt) && > >> rt->dst.dev->ifindex != endpoint->src_if4)))) > >> fl.saddr = 0; > >> > >> Well it's initialized from endpoint->src4.s_addr, overwritten with zero in > >> some cases, which I believe lets the kernel do it's regular source addr > >> selection, and populated from something called dst_cache at some callsites. > >> > >> @Nico could it perhaps simply be that you're hitting one of these zero'ing > >> cases and that's why it's using regular kernel src addr selection instead > >> of the cached endpoint src4 address? > >> > >> The first case !inet_confirm_addr(..., RT_SCOPE_HOST) ought to confirm that > >> the saddr is actually still a local address. Makes sens if the address we > >> remembered was removed from the interface we can't use it anymore. > >> > >> The second case looks like it's checking if the (sometimes cached) src_if4 > >> interface index is still what the route we're about to use points to. > >> > >> If neither of those seem likely we can keep reading :) > >> > >> --Daniel > >> > >> > >> > > > -- > Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Src addr code review (Was: Source IP incorrect on multi homed systems) 2023-02-19 22:42 ` Src addr code review (Was: Source IP incorrect on multi homed systems) Daniel Gröber 2023-02-20 0:28 ` 曹煜 @ 2023-02-20 9:47 ` Nico Schottelius 2023-02-20 20:43 ` dxld 1 sibling, 1 reply; 34+ messages in thread From: Nico Schottelius @ 2023-02-20 9:47 UTC (permalink / raw) To: Daniel Gröber; +Cc: Nico Schottelius, Roman Mamedov, tlhackque, wireguard Hey Daniel, thanks a lot for diving in ... Daniel Gröber <dxld@darkboxed.org> writes: > Let's look at the code (heavily culled): > > struct flowi4 fl = { > .saddr = endpoint->src4.s_addr, > }; > if (cache) > rt = dst_cache_get_ip4(cache, &fl.saddr); What I am wondering is, how did it get into the cache in the first place? > [...] > > @Nico could it perhaps simply be that you're hitting one of these zero'ing > cases and that's why it's using regular kernel src addr selection instead > of the cached endpoint src4 address? That could absolutely be the case. What is funky is that I see the problem on two very different systems, but maybe it's a good time to elaborate on this: - System A: - Wireguard module loaded on the host - Wireguard wg-quick used within a kubernetes pods that has permissions for managing wireguard - The same pod also runs bird for BGP peering - System B: - Wireguard running as wireguard-go on OpnSense / FreeBSD - BGP running with frr Both systems exhibit the behaviour, but maybe it's better to focus on System A first, as this seems to be more the "upstream" source. Best regards, Nico -- Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Src addr code review (Was: Source IP incorrect on multi homed systems) 2023-02-20 9:47 ` Nico Schottelius @ 2023-02-20 20:43 ` dxld 0 siblings, 0 replies; 34+ messages in thread From: dxld @ 2023-02-20 20:43 UTC (permalink / raw) To: Nico Schottelius; +Cc: Roman Mamedov, tlhackque, wireguard Hi Nico, On Mon, Feb 20, 2023 at 10:47:36AM +0100, Nico Schottelius wrote: > Daniel Gröber <dxld@darkboxed.org> writes: > > Let's look at the code (heavily culled): > > > > struct flowi4 fl = { > > .saddr = endpoint->src4.s_addr, > > }; > > if (cache) > > rt = dst_cache_get_ip4(cache, &fl.saddr); > > What I am wondering is, how did it get into the cache in the first place? Right so, endpoint->src4 is set in wg_socket_set_peer_endpoint, which is called either trough through wg_socket_endpoint_from_skb in the handshake receive code or wg_socket_set_peer_endpoint in the data path. The _from_skb variant also calls wg_socket_endpoint_from_skb. Here we're remembering the src addr of the (received) packet in addr4 and the dst addr we're going to use for sending as src4 as you'd expect: endpoint->addr4.sin_family = AF_INET; endpoint->addr4.sin_port = udp_hdr(skb)->source; endpoint->addr4.sin_addr.s_addr = ip_hdr(skb)->saddr; endpoint->src4.s_addr = ip_hdr(skb)->daddr; endpoint->src_if4 = skb->skb_iif; The dst_cache is set just after those zero'ing conditionals we were looking at before. It's cleared whenever the endpoint/port changes or one of those cases is hit. Note the dst_cache is only used for data packets, so handshakes would be unaffected if it was the cause of your woes. > > @Nico could it perhaps simply be that you're hitting one of these zero'ing > > cases and that's why it's using regular kernel src addr selection instead > > of the cached endpoint src4 address? > > That could absolutely be the case. What is funky is that I see the > problem on two very different systems > > Both systems exhibit the behaviour, but maybe it's better to focus on > System A first, as this seems to be more the "upstream" source. It is weird indeed, but yeah. One thing at a time. BTW, what kernel version/distro are we dealing with? --Daniel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 20:42 ` Roman Mamedov 2023-02-19 21:19 ` Nico Schottelius @ 2023-02-19 21:39 ` tlhackque 1 sibling, 0 replies; 34+ messages in thread From: tlhackque @ 2023-02-19 21:39 UTC (permalink / raw) To: Roman Mamedov, Nico Schottelius; +Cc: wireguard [-- Attachment #1.1: Type: text/plain, Size: 2964 bytes --] On 19-Feb-23 15:42, Roman Mamedov wrote: > On Sun, 19 Feb 2023 21:18:34 +0100 > Nico Schottelius<nico.schottelius@ungleich.ch> wrote: > >> If I am not mistaken that would mean in practice: >> >> if orignal_pkg.ip_dst == one_of_my_ips then >> return_pkg.ip.src = orignal_pkg.ip_dst >> return_pkg.ip.dst = orignal_pkg.ip_src >> fi >> >> For me that sounds like a sane approach (aside from >> my very simplified algorithm). > Except there is no request and response in WG, and as such no original or > return packet. Another peer contacts you, then some time later you contact the > other peer. Or the other way round. > > WG-wise what will need to be done is to store in the each peer's information > structure the local IP that we are supposed to use for communication with that > peer; and updating it when receiving packets from the peer, using the > destination of those. So you would see a "Local IP" in each "peer" section > when doing a "wg show". > > Also, until there is such IP initially stored, it will have to be some default > outgoing IP of the system towards that peer. BTW, how would this work in your > setup, what if not the peer contacts you first, but your machine needs to > contact the peer? > The situation can be (and often is) the same for both peers. If you're the initiator, you send to the peer address using its configured or DNS IP address, and normal routing. You note the address used to send, and use it for future communications to that peer. The first packet sets state in the posited firewall/nat. Subsequent packets using the same source address ensures that the firewall sees them as the same flow. When the peer gets around to saying something - which it will at latest when the keepalive timer goes off, but probably sooner, it will have noted your source address and it's local IP address (the one you used). So it will send using the source address that you know about. This is the same algorithm used by the peer, so they should agree. When either end detects and address change, the process restarts. There is a possibility that the initial packets pass in flight, but I think that would at most result in a dropped packet, which will be resent. I don't think there's a deadlock, but in the event of thrashing, a tie-breaker of using the lowest candidate IP address generally works.. When there are multiple choices, it doesn't really matter which pair of IP addresses are picked, as long as they're stable while the systems reside on the same networks. (E.G. it could be two notebook PCs in different hotel rooms, not just two fixed servers or one fixed server and one mobile.) The goal is to establish a flow that stateful packet inspection, NAT, routing can recognize and use to keep a pinhole open... I don't have time at the moment to work out the corner cases, but that's the overall approach. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 840 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Source IP incorrect on multi homed systems 2023-02-19 14:39 ` Christoph Loesch 2023-02-19 16:32 ` David Kerr @ 2023-02-19 20:02 ` Nico Schottelius 1 sibling, 0 replies; 34+ messages in thread From: Nico Schottelius @ 2023-02-19 20:02 UTC (permalink / raw) To: Christoph Loesch; +Cc: wireguard Hello Christoph, Christoph Loesch <wireguard-mail@chil.at> writes: > @Nico: did you try to delete the affected route and add it again with the correct source IP ? No, I did not because the routes are really dynamic on the affected systems and I would need to overwrite the BGP routes with a better metric, which in turn will likely break the return path. > as I mentioned it in https://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html > > ip route del <NET> > ip route add <NET> dev <ALIAS_DEV> src <SRC_IP> > > This way I was able to (at least temporary) fix this issue on multi homed systems. Much appreciate the hint. However changing routes manually on as many routers/vpn endpoints as we have is not a practical solution. To fix the current project's issue we have shifted the VPN endpoint to a single homed device for the moment. Best regards, Nico -- Sustainable and modern Infrastructures by ungleich.ch ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2023-02-20 20:43 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-02-18 20:14 Source IP incorrect on multi homed systems Nico Schottelius [not found] ` <CAHx9msc1cNV80YU7HRmQ9gsjSEiVZ=pb31aYqfP62hy8DeuGZA@mail.gmail.com> 2023-02-18 22:34 ` Nico Schottelius 2023-02-19 0:45 ` Mike O'Connor 2023-02-19 8:01 ` Nico Schottelius 2023-02-19 9:19 ` Mikma 2023-02-19 12:04 ` Nico Schottelius 2023-02-19 12:10 ` Nico Schottelius 2023-02-19 18:59 ` Peter Linder [not found] ` <2ed829aaed9fec59ac2a9b32c4ce0a9005b8d8b850be81c81a226791855fe4eb@mu.id> 2023-02-19 12:13 ` Nico Schottelius 2023-02-19 14:39 ` Christoph Loesch 2023-02-19 16:32 ` David Kerr 2023-02-19 16:54 ` Sebastian Hyrvall 2023-02-19 18:04 ` Janne Johansson 2023-02-19 18:08 ` Sebastian Hyrvall 2023-02-19 20:11 ` Nico Schottelius 2023-02-19 17:05 ` tlhackque [not found] ` <CADGd2DoE6TCtCxxWL7JWyNW5+yy_Pe+9MNzHznbudMWLTXQreA@mail.gmail.com> 2023-02-19 18:30 ` Fwd: " John Lauro 2023-02-19 22:28 ` tlhackque 2023-02-20 0:58 ` Luiz Angelo Daros de Luca 2023-02-19 18:37 ` David Kerr 2023-02-19 18:52 ` tlhackque 2023-02-19 18:42 ` tlhackque 2023-02-19 20:18 ` Nico Schottelius 2023-02-19 20:42 ` Roman Mamedov 2023-02-19 21:19 ` Nico Schottelius 2023-02-19 22:06 ` tlhackque 2023-02-19 22:42 ` Src addr code review (Was: Source IP incorrect on multi homed systems) Daniel Gröber 2023-02-20 0:28 ` 曹煜 2023-02-20 10:40 ` Nico Schottelius 2023-02-20 11:21 ` 曹煜 2023-02-20 9:47 ` Nico Schottelius 2023-02-20 20:43 ` dxld 2023-02-19 21:39 ` Source IP incorrect on multi homed systems tlhackque 2023-02-19 20:02 ` Nico Schottelius
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).