Development discussion of WireGuard
 help / color / mirror / Atom feed
* Source IP incorrect on multi homed systems
@ 2023-02-18 20:14 Nico Schottelius
       [not found] ` <CAHx9msc1cNV80YU7HRmQ9gsjSEiVZ=pb31aYqfP62hy8DeuGZA@mail.gmail.com>
  2023-02-19  0:45 ` Mike O'Connor
  0 siblings, 2 replies; 35+ messages in thread
From: Nico Schottelius @ 2023-02-18 20:14 UTC (permalink / raw)
  To: WireGuard mailing list


Dear group,

I was wondering how wireguard [Linux kernel] or wireguard-go [FreeBSD]
are supposed to decide which IP address to use for replying?

I have seen both on FreeBSD and Linux that wireguard seems to use the IP
address of the outgoing interface, i.e. the one with the route returning
to the sender. However in multi homed situations, this can be wrong,
let's take this example:

      19:57:24.607526 net1  In  IP 194.5.220.43.60770 > 147.78.195.254.51820: UDP, length 148
      19:57:24.608358 net2  Out IP 195.141.200.73.51820 > 194.5.220.43.60770: UDP, length 92

The initiator sends from 194.5.220.43 to the receiver 147.78.195.254.
Wireguard then replies with the source IP of 195.141.200.73 instead of
147.78.195.254.

As the node is multi homed, the packet might leave through any of its
uplinks and thus return with a random (unexpected) IP address and will
not pass NAT rules on firewalls and finally be dropped. F.i. in above
example the firewall drops the packet from 195.141.200.73, because there
is no session entry for that.

I have observed this behaviour both on Linux 6.1.11 as well as
wireguard-go 0.0.20220316_8,1 on FreeBSD and in both cases the
connection will break depending on which active interface is taken as
exit.

I would argue that wireguard should by default invert the IP
addresses, i.e. switch dst=src, src=dst and then reply with that,
instead of adapting an interface specific address, or is there a good
reason for the current behaviour?

Best regards,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
       [not found] ` <CAHx9msc1cNV80YU7HRmQ9gsjSEiVZ=pb31aYqfP62hy8DeuGZA@mail.gmail.com>
@ 2023-02-18 22:34   ` Nico Schottelius
  0 siblings, 0 replies; 35+ messages in thread
From: Nico Schottelius @ 2023-02-18 22:34 UTC (permalink / raw)
  To: Omkhar Arasaratnam; +Cc: Nico Schottelius, WireGuard mailing list


Hello Omkhar,

I tend to disagree. The problem is not the routing, but the selected
source address, which is independent of routing. To be more specific: as
there is BGP routing on all all interfaces, 147.78.195.254 is an
accepted IP address on any interface.

Best regards,

Nico

Omkhar Arasaratnam <omkhar@gmail.com> writes:

> This looks like an asymmetric routing issue from what you’re describing, not a wireguard issue.
>
> You may want to look into policy based routing to address it.
>
> On Sat, Feb 18, 2023 at 15:54 Nico Schottelius <nico.schottelius@ungleich.ch> wrote:
>
>  Dear group,
>
>  I was wondering how wireguard [Linux kernel] or wireguard-go [FreeBSD]
>  are supposed to decide which IP address to use for replying?
>
>  I have seen both on FreeBSD and Linux that wireguard seems to use the IP
>  address of the outgoing interface, i.e. the one with the route returning
>  to the sender. However in multi homed situations, this can be wrong,
>  let's take this example:
>
>        19:57:24.607526 net1  In  IP 194.5.220.43.60770 > 147.78.195.254.51820: UDP, length 148
>        19:57:24.608358 net2  Out IP 195.141.200.73.51820 > 194.5.220.43.60770: UDP, length 92
>
>  The initiator sends from 194.5.220.43 to the receiver 147.78.195.254.
>  Wireguard then replies with the source IP of 195.141.200.73 instead of
>  147.78.195.254.
>
>  As the node is multi homed, the packet might leave through any of its
>  uplinks and thus return with a random (unexpected) IP address and will
>  not pass NAT rules on firewalls and finally be dropped. F.i. in above
>  example the firewall drops the packet from 195.141.200.73, because there
>  is no session entry for that.
>
>  I have observed this behaviour both on Linux 6.1.11 as well as
>  wireguard-go 0.0.20220316_8,1 on FreeBSD and in both cases the
>  connection will break depending on which active interface is taken as
>  exit.
>
>  I would argue that wireguard should by default invert the IP
>  addresses, i.e. switch dst=src, src=dst and then reply with that,
>  instead of adapting an interface specific address, or is there a good
>  reason for the current behaviour?
>
>  Best regards,
>
>  Nico
>
>  --
>  Sustainable and modern Infrastructures by ungleich.ch


--
Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-18 20:14 Source IP incorrect on multi homed systems Nico Schottelius
       [not found] ` <CAHx9msc1cNV80YU7HRmQ9gsjSEiVZ=pb31aYqfP62hy8DeuGZA@mail.gmail.com>
@ 2023-02-19  0:45 ` Mike O'Connor
  2023-02-19  8:01   ` Nico Schottelius
  1 sibling, 1 reply; 35+ messages in thread
From: Mike O'Connor @ 2023-02-19  0:45 UTC (permalink / raw)
  To: Nico Schottelius, WireGuard mailing list

Generally all OSs will if sending from a local process will use the 
address of the outgoing interface for the packet.

If the packet is forwarded and no NAT is used the address will be routed 
via the interface suggested by the routing table.

So local routing can be a real pain, policy based routing is an option. 
The other option could be to setup an 'output' NAT to an address which 
is multi-homed.

I have a system running which is multi-homed with out issue other than 
the actual routing machine. This machine is BGP connected to three 
locations.

There is no NAT setup and because I also add the wireguard link 
addresses to the BGP sessions.

Cheers



On 19/2/2023 6:44 am, Nico Schottelius wrote:
> Dear group,
>
> I was wondering how wireguard [Linux kernel] or wireguard-go [FreeBSD]
> are supposed to decide which IP address to use for replying?
>
> I have seen both on FreeBSD and Linux that wireguard seems to use the IP
> address of the outgoing interface, i.e. the one with the route returning
> to the sender. However in multi homed situations, this can be wrong,
> let's take this example:
>
>        19:57:24.607526 net1  In  IP 194.5.220.43.60770 > 147.78.195.254.51820: UDP, length 148
>        19:57:24.608358 net2  Out IP 195.141.200.73.51820 > 194.5.220.43.60770: UDP, length 92
>
> The initiator sends from 194.5.220.43 to the receiver 147.78.195.254.
> Wireguard then replies with the source IP of 195.141.200.73 instead of
> 147.78.195.254.
>
> As the node is multi homed, the packet might leave through any of its
> uplinks and thus return with a random (unexpected) IP address and will
> not pass NAT rules on firewalls and finally be dropped. F.i. in above
> example the firewall drops the packet from 195.141.200.73, because there
> is no session entry for that.
>
> I have observed this behaviour both on Linux 6.1.11 as well as
> wireguard-go 0.0.20220316_8,1 on FreeBSD and in both cases the
> connection will break depending on which active interface is taken as
> exit.
>
> I would argue that wireguard should by default invert the IP
> addresses, i.e. switch dst=src, src=dst and then reply with that,
> instead of adapting an interface specific address, or is there a good
> reason for the current behaviour?
>
> Best regards,
>
> Nico
>
> --
> Sustainable and modern Infrastructures by ungleich.ch



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19  0:45 ` Mike O'Connor
@ 2023-02-19  8:01   ` Nico Schottelius
  2023-02-19  9:19     ` Mikma
                       ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Nico Schottelius @ 2023-02-19  8:01 UTC (permalink / raw)
  To: Mike O'Connor; +Cc: Nico Schottelius, WireGuard mailing list


Let me rephrase the problem statement:

    - ping and http calls to the multi homed machine work correctly:
      I can ping 147.78.195.254 and the reply contains the same address.
      I can ping 195.141.200.73 and the reply contains the same address.
      I can curl 147.78.195.254 and the reply contains the same address.
      I can curl 195.141.200.73 and the reply contains the same address.

    - wireguard does NOT work because it changes the reply address:
      A packet sent to 147.78.195.254 is being replied with 195.141.200.73

In general, processes reply with the IP address that was used to contact
them and not with the outgoing interface address, which would also break
adding IP addresses to the loopback interface.

For full detail, see ip addresses [0] and routing below [1] and tests
executed [2].

I believe that this is a bug in wireguard.

--------------------------------------------------------------------------------

[2]

Let's see how it looks like in detail:

1) ping to 147.78.195.254: works

[9:14] nb3:~% ping -c2 147.78.195.254
PING 147.78.195.254 (147.78.195.254) 56(84) bytes of data.
64 bytes from 147.78.195.254: icmp_seq=1 ttl=53 time=7.27 ms
64 bytes from 147.78.195.254: icmp_seq=2 ttl=53 time=6.30 ms

--- 147.78.195.254 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 6.296/6.781/7.267/0.485 ms

/ # tcpdump -ni any host 194.5.220.43
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
08:14:48.379618 net1  In  IP 194.5.220.43 > 147.78.195.254: ICMP echo request, id 89, seq 1, length 64
08:14:48.379651 net2  Out IP 147.78.195.254 > 194.5.220.43: ICMP echo reply, id 89, seq 1, length 64
08:14:49.380340 net1  In  IP 194.5.220.43 > 147.78.195.254: ICMP echo request, id 89, seq 2, length 64
08:14:49.380392 net2  Out IP 147.78.195.254 > 194.5.220.43: ICMP echo reply, id 89, seq 2, length 64

2) ping to 195.141.200.73

[9:14] nb3:~% ping -c2 195.141.200.73
PING 195.141.200.73 (195.141.200.73) 56(84) bytes of data.
64 bytes from 195.141.200.73: icmp_seq=1 ttl=53 time=11.3 ms
64 bytes from 195.141.200.73: icmp_seq=2 ttl=53 time=6.81 ms

--- 195.141.200.73 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 6.813/9.057/11.301/2.244 ms
[9:15] nb3:~%
/ # tcpdump -ni any host 194.5.220.43
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
08:16:19.257697 net2  In  IP 194.5.220.43 > 195.141.200.73: ICMP echo request, id 91, seq 1, length 64
08:16:19.257730 net2  Out IP 195.141.200.73 > 194.5.220.43: ICMP echo reply, id 91, seq 1, length 64
08:16:20.250948 net2  In  IP 194.5.220.43 > 195.141.200.73: ICMP echo request, id 91, seq 2, length 64
08:16:20.250980 net2  Out IP 195.141.200.73 > 194.5.220.43: ICMP echo reply, id 91, seq 2, length 64

3) http to 147.78.195.254

[9:16] nb3:~% curl -s 147.78.195.254 > /dev/null ; echo $?
0
/ # tcpdump -ni any host 194.5.220.43
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
08:17:04.082945 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [S], seq 1405408358, win 64240, options [mss 1460,sackOK,TS val 1380610701 ecr 0,nop,wscale 7], length 0
08:17:04.082983 net2  Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [S.], seq 3790092363, ack 1405408359, win 65160, options [mss 1460,sackOK,TS val 520503591 ecr 1380610701,nop,wscale 7], length 0
08:17:04.089996 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 1380610709 ecr 520503591], length 0
08:17:04.090121 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [P.], seq 1:79, ack 1, win 502, options [nop,nop,TS val 1380610709 ecr 520503591], length 78: HTTP: GET / HTTP/1.1
08:17:04.090136 net2  Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [.], ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 0
08:17:04.090301 net2  Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [P.], seq 1:239, ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 238: HTTP: HTTP/1.1 200 OK
08:17:04.090381 net2  Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [P.], seq 239:854, ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 615: HTTP
08:17:04.096058 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 239, win 501, options [nop,nop,TS val 1380610715 ecr 520503598], length 0
08:17:04.096059 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 854, win 497, options [nop,nop,TS val 1380610715 ecr 520503598], length 0
08:17:04.096339 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [F.], seq 79, ack 854, win 501, options [nop,nop,TS val 1380610715 ecr 520503598], length 0
08:17:04.096450 net2  Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [F.], seq 854, ack 80, win 509, options [nop,nop,TS val 520503604 ecr 1380610715], length 0
08:17:04.102609 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 855, win 501, options [nop,nop,TS val 1380610721 ecr 520503604], length 0


4) http to 195.141.200.73

[9:17] nb3:~% curl -s 195.141.200.73 > /dev/null ; echo $?
0

/ # tcpdump -ni any host 194.5.220.43
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
08:18:05.951066 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [S], seq 1556080700, win 64240, options [mss 1460,sackOK,TS val 765965336 ecr 0,nop,wscale 7], length 0
08:18:05.951106 net2  Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [S.], seq 3465881361, ack 1556080701, win 65160, options [mss 1460,sackOK,TS val 3168643538 ecr 765965336,nop,wscale 7], length 0
08:18:05.958699 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 765965342 ecr 3168643538], length 0
08:18:05.958749 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [P.], seq 1:79, ack 1, win 502, options [nop,nop,TS val 765965342 ecr 3168643538], length 78: HTTP: GET / HTTP/1.1
08:18:05.958763 net2  Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [.], ack 79, win 509, options [nop,nop,TS val 3168643545 ecr 765965342], length 0
08:18:05.959216 net2  Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [P.], seq 1:239, ack 79, win 509, options [nop,nop,TS val 3168643546 ecr 765965342], length 238: HTTP: HTTP/1.1 200 OK
08:18:05.959327 net2  Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [P.], seq 239:854, ack 79, win 509, options [nop,nop,TS val 3168643546 ecr 765965342], length 615: HTTP
08:18:05.965244 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 239, win 501, options [nop,nop,TS val 765965350 ecr 3168643546], length 0
08:18:05.965348 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 854, win 497, options [nop,nop,TS val 765965350 ecr 3168643546], length 0
08:18:05.965487 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [F.], seq 79, ack 854, win 501, options [nop,nop,TS val 765965350 ecr 3168643546], length 0
08:18:05.965573 net2  Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [F.], seq 854, ack 80, win 509, options [nop,nop,TS val 3168643552 ecr 765965350], length 0
08:18:05.971916 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 855, win 501, options [nop,nop,TS val 765965356 ecr 3168643552], length 0



[0]
wireguard "server" that changes the source ip:

/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if29: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
    link/ether 66:4a:9c:12:5b:6c brd ff:ff:ff:ff:ff:ff
    inet6 2a0a:e5c0:10:1e:7f21:83ca:a7d:46d2/128 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::644a:9cff:fe12:5b6c/64 scope link
       valid_lft forever preferred_lft forever
4: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 3c:ec:ef:cb:d8:1b brd ff:ff:ff:ff:ff:ff
    inet 147.78.195.254/27 brd 147.78.195.255 scope global net1
       valid_lft forever preferred_lft forever
    inet6 2a0a:e5c0:1:8::53/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::3eec:efff:fecb:d81b/64 scope link
       valid_lft forever preferred_lft forever
5: v1477819464: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN qlen 1000
    link/[65534]
    inet 147.78.194.65/26 scope global v1477819464
       valid_lft forever preferred_lft forever
    inet6 2a0a:e5c0:2e::1/64 scope global
       valid_lft forever preferred_lft forever
26: net2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 3c:ec:ef:cb:d8:1c brd ff:ff:ff:ff:ff:ff
    inet 195.141.200.73/31 scope global net2
       valid_lft forever preferred_lft forever
    inet6 2001:1700:3500:2::12/124 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::3eec:efff:fecb:d81c/64 scope link
       valid_lft forever preferred_lft forever
/ #

wireguard client behind nat:

nb3:/etc/wireguard# curl -4 ifconfig.io
194.5.220.43
nb3:/etc/wireguard# ip a sh dev wlan0
2: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 84:5c:f3:ed:52:9c brd ff:ff:ff:ff:ff:ff
    inet 192.168.4.85/24 brd 192.168.4.255 scope global dynamic noprefixroute wlan0
       valid_lft 317sec preferred_lft 242sec
    inet6 2a0a:e5c0:13:0:865c:f3ff:feed:529c/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 86394sec preferred_lft 14394sec
    inet6 fe80::865c:f3ff:feed:529c/64 scope link
       valid_lft forever preferred_lft forever
nb3:/etc/wireguard#


[1]
/ # ip route get 194.5.220.43
194.5.220.43 via 195.141.200.72 dev net2  src 195.141.200.73
/ #


Mike O'Connor <mike@pineview.net> writes:

> Generally all OSs will if sending from a local process will use the
> address of the outgoing interface for the packet.
>
> If the packet is forwarded and no NAT is used the address will be
> routed via the interface suggested by the routing table.
>
> So local routing can be a real pain, policy based routing is an
> option. The other option could be to setup an 'output' NAT to an
> address which is multi-homed.
>
> I have a system running which is multi-homed with out issue other than
> the actual routing machine. This machine is BGP connected to three
> locations.
>
> There is no NAT setup and because I also add the wireguard link
> addresses to the BGP sessions.
>
> Cheers
>
>
>
> On 19/2/2023 6:44 am, Nico Schottelius wrote:
>> Dear group,
>>
>> I was wondering how wireguard [Linux kernel] or wireguard-go [FreeBSD]
>> are supposed to decide which IP address to use for replying?
>>
>> I have seen both on FreeBSD and Linux that wireguard seems to use the IP
>> address of the outgoing interface, i.e. the one with the route returning
>> to the sender. However in multi homed situations, this can be wrong,
>> let's take this example:
>>
>>        19:57:24.607526 net1  In  IP 194.5.220.43.60770 > 147.78.195.254.51820: UDP, length 148
>>        19:57:24.608358 net2  Out IP 195.141.200.73.51820 > 194.5.220.43.60770: UDP, length 92
>>
>> The initiator sends from 194.5.220.43 to the receiver 147.78.195.254.
>> Wireguard then replies with the source IP of 195.141.200.73 instead of
>> 147.78.195.254.
>>
>> As the node is multi homed, the packet might leave through any of its
>> uplinks and thus return with a random (unexpected) IP address and will
>> not pass NAT rules on firewalls and finally be dropped. F.i. in above
>> example the firewall drops the packet from 195.141.200.73, because there
>> is no session entry for that.
>>
>> I have observed this behaviour both on Linux 6.1.11 as well as
>> wireguard-go 0.0.20220316_8,1 on FreeBSD and in both cases the
>> connection will break depending on which active interface is taken as
>> exit.
>>
>> I would argue that wireguard should by default invert the IP
>> addresses, i.e. switch dst=src, src=dst and then reply with that,
>> instead of adapting an interface specific address, or is there a good
>> reason for the current behaviour?
>>
>> Best regards,
>>
>> Nico
>>
>> --
>> Sustainable and modern Infrastructures by ungleich.ch


--
Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19  8:01   ` Nico Schottelius
@ 2023-02-19  9:19     ` Mikma
  2023-02-19 12:04       ` Nico Schottelius
  2023-02-19 12:10     ` Nico Schottelius
       [not found]     ` <2ed829aaed9fec59ac2a9b32c4ce0a9005b8d8b850be81c81a226791855fe4eb@mu.id>
  2 siblings, 1 reply; 35+ messages in thread
From: Mikma @ 2023-02-19  9:19 UTC (permalink / raw)
  To: wireguard, Nico Schottelius, Mike O'Connor; +Cc: WireGuard mailing list

Have you tried setting the preferred src address of the route(s) to the addresses you desire?

From "man ip":

> src ADDRESS the source address to prefer when sending to the destinations covered by the route prefix. 

On 19 February 2023 09:01:31 CET, Nico Schottelius <nico.schottelius@ungleich.ch> wrote:
>
>Let me rephrase the problem statement:
>
>    - ping and http calls to the multi homed machine work correctly:
>      I can ping 147.78.195.254 and the reply contains the same address.
>      I can ping 195.141.200.73 and the reply contains the same address.
>      I can curl 147.78.195.254 and the reply contains the same address.
>      I can curl 195.141.200.73 and the reply contains the same address.
>
>    - wireguard does NOT work because it changes the reply address:
>      A packet sent to 147.78.195.254 is being replied with 195.141.200.73
>
>In general, processes reply with the IP address that was used to contact
>them and not with the outgoing interface address, which would also break
>adding IP addresses to the loopback interface.
>
>For full detail, see ip addresses [0] and routing below [1] and tests
>executed [2].
>
>I believe that this is a bug in wireguard.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19  9:19     ` Mikma
@ 2023-02-19 12:04       ` Nico Schottelius
  0 siblings, 0 replies; 35+ messages in thread
From: Nico Schottelius @ 2023-02-19 12:04 UTC (permalink / raw)
  To: Mikma; +Cc: Nico Schottelius, Mike O'Connor, WireGuard mailing list


Hello Mikma,

Mikma <mikma.wg@lists.m7n.se> writes:

> Have you tried setting the preferred src address of the route(s) to the addresses you desire?
>
> From "man ip":
>
>> src ADDRESS the source address to prefer when sending to the destinations covered by the route prefix.

unfortunately this does not solve the problem. The expected behaviour of
wireguard is to reply with the same IP address, like nginx and the
kernel ICMP handler do, not with a route based outgoing interface IP address.

In a BGP based environment the route can vary dynamically and I showed a
stripped down version to make it easier to understand. In practices,
many of our systems have 4-7 different upstreams and the packet can come
in on any interface and should leave the machine on the current correct
interface depending on the route import.

In no case however, wireguard should change the response address,
because this breaks stateful firewalls.

As demonstrated in my last email, both the in-kernel ICMP handler as
well as user space applications like nginx behave correctly on the same
machine.

I briefly checked the wireguard source code and I did not right away
spot the network handling part that sets the source IP, so I am
wondering if this bug is due to wireguard not handling it at all?

Best regards,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19  8:01   ` Nico Schottelius
  2023-02-19  9:19     ` Mikma
@ 2023-02-19 12:10     ` Nico Schottelius
  2023-02-19 18:59       ` Peter Linder
       [not found]     ` <2ed829aaed9fec59ac2a9b32c4ce0a9005b8d8b850be81c81a226791855fe4eb@mu.id>
  2 siblings, 1 reply; 35+ messages in thread
From: Nico Schottelius @ 2023-02-19 12:10 UTC (permalink / raw)
  To: Nico Schottelius; +Cc: Mike O'Connor, WireGuard mailing list


Aside from nginx + icmp being handled correctly as a reference,
I want to further elaborate on this case to show that something is
really wrong with the current behaviour:

A typical scenario for routers is to have a lot of global reachable IP
addresses (IPv6, IPv4) assigned to the loopback interface, such as this
system:

[13:11] router2.place6:~# ip a sh dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 2a0a:e5c0:1e:a::b/128 scope global
       valid_lft forever preferred_lft forever
    inet6 2a0a:e5c0:1e:a::a/128 scope global
       valid_lft forever preferred_lft forever
    inet6 2a0a:e5c0:2:a::b/128 scope global
       valid_lft forever preferred_lft forever
    inet6 2a0a:e5c0:2:a::a/128 scope global
       valid_lft forever preferred_lft forever
    inet6 2a0a:e5c0:2:1::7/128 scope global
       valid_lft forever preferred_lft forever
    inet6 2a0a:e5c0:2:1::6/128 scope global
       valid_lft forever preferred_lft forever
    inet6 2a0a:e5c0:2:1::5/128 scope global
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever

The motivation behind that is that independent of the actual routing
interface, these IP addresses are always reachable.

Now in the case of wireguard selecting the source IP based on the
outgoing interface, this is never going to work, as lo cannot send
packets to the outside world.


Nico Schottelius <nico.schottelius@ungleich.ch> writes:

> Let me rephrase the problem statement:
>
>     - ping and http calls to the multi homed machine work correctly:
>       I can ping 147.78.195.254 and the reply contains the same address.
>       I can ping 195.141.200.73 and the reply contains the same address.
>       I can curl 147.78.195.254 and the reply contains the same address.
>       I can curl 195.141.200.73 and the reply contains the same address.
>
>     - wireguard does NOT work because it changes the reply address:
>       A packet sent to 147.78.195.254 is being replied with 195.141.200.73
>
> In general, processes reply with the IP address that was used to contact
> them and not with the outgoing interface address, which would also break
> adding IP addresses to the loopback interface.
>
> For full detail, see ip addresses [0] and routing below [1] and tests
> executed [2].
>
> I believe that this is a bug in wireguard.
>
> --------------------------------------------------------------------------------
>
> [2]
>
> Let's see how it looks like in detail:
>
> 1) ping to 147.78.195.254: works
>
> [9:14] nb3:~% ping -c2 147.78.195.254
> PING 147.78.195.254 (147.78.195.254) 56(84) bytes of data.
> 64 bytes from 147.78.195.254: icmp_seq=1 ttl=53 time=7.27 ms
> 64 bytes from 147.78.195.254: icmp_seq=2 ttl=53 time=6.30 ms
>
> --- 147.78.195.254 ping statistics ---
> 2 packets transmitted, 2 received, 0% packet loss, time 1002ms
> rtt min/avg/max/mdev = 6.296/6.781/7.267/0.485 ms
>
> / # tcpdump -ni any host 194.5.220.43
> tcpdump: data link type LINUX_SLL2
> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
> 08:14:48.379618 net1  In  IP 194.5.220.43 > 147.78.195.254: ICMP echo request, id 89, seq 1, length 64
> 08:14:48.379651 net2  Out IP 147.78.195.254 > 194.5.220.43: ICMP echo reply, id 89, seq 1, length 64
> 08:14:49.380340 net1  In  IP 194.5.220.43 > 147.78.195.254: ICMP echo request, id 89, seq 2, length 64
> 08:14:49.380392 net2  Out IP 147.78.195.254 > 194.5.220.43: ICMP echo reply, id 89, seq 2, length 64
>
> 2) ping to 195.141.200.73
>
> [9:14] nb3:~% ping -c2 195.141.200.73
> PING 195.141.200.73 (195.141.200.73) 56(84) bytes of data.
> 64 bytes from 195.141.200.73: icmp_seq=1 ttl=53 time=11.3 ms
> 64 bytes from 195.141.200.73: icmp_seq=2 ttl=53 time=6.81 ms
>
> --- 195.141.200.73 ping statistics ---
> 2 packets transmitted, 2 received, 0% packet loss, time 1002ms
> rtt min/avg/max/mdev = 6.813/9.057/11.301/2.244 ms
> [9:15] nb3:~%
> / # tcpdump -ni any host 194.5.220.43
> tcpdump: data link type LINUX_SLL2
> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
> 08:16:19.257697 net2  In  IP 194.5.220.43 > 195.141.200.73: ICMP echo request, id 91, seq 1, length 64
> 08:16:19.257730 net2  Out IP 195.141.200.73 > 194.5.220.43: ICMP echo reply, id 91, seq 1, length 64
> 08:16:20.250948 net2  In  IP 194.5.220.43 > 195.141.200.73: ICMP echo request, id 91, seq 2, length 64
> 08:16:20.250980 net2  Out IP 195.141.200.73 > 194.5.220.43: ICMP echo reply, id 91, seq 2, length 64
>
> 3) http to 147.78.195.254
>
> [9:16] nb3:~% curl -s 147.78.195.254 > /dev/null ; echo $?
> 0
> / # tcpdump -ni any host 194.5.220.43
> tcpdump: data link type LINUX_SLL2
> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
> 08:17:04.082945 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [S], seq 1405408358, win 64240, options [mss 1460,sackOK,TS val 1380610701 ecr 0,nop,wscale 7], length 0
> 08:17:04.082983 net2  Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [S.], seq 3790092363, ack 1405408359, win 65160, options [mss 1460,sackOK,TS val 520503591 ecr 1380610701,nop,wscale 7], length 0
> 08:17:04.089996 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 1380610709 ecr 520503591], length 0
> 08:17:04.090121 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [P.], seq 1:79, ack 1, win 502, options [nop,nop,TS val 1380610709 ecr 520503591], length 78: HTTP: GET / HTTP/1.1
> 08:17:04.090136 net2  Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [.], ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 0
> 08:17:04.090301 net2  Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [P.], seq 1:239, ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 238: HTTP: HTTP/1.1 200 OK
> 08:17:04.090381 net2  Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [P.], seq 239:854, ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 615: HTTP
> 08:17:04.096058 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 239, win 501, options [nop,nop,TS val 1380610715 ecr 520503598], length 0
> 08:17:04.096059 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 854, win 497, options [nop,nop,TS val 1380610715 ecr 520503598], length 0
> 08:17:04.096339 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [F.], seq 79, ack 854, win 501, options [nop,nop,TS val 1380610715 ecr 520503598], length 0
> 08:17:04.096450 net2  Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [F.], seq 854, ack 80, win 509, options [nop,nop,TS val 520503604 ecr 1380610715], length 0
> 08:17:04.102609 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 855, win 501, options [nop,nop,TS val 1380610721 ecr 520503604], length 0
>
>
> 4) http to 195.141.200.73
>
> [9:17] nb3:~% curl -s 195.141.200.73 > /dev/null ; echo $?
> 0
>
> / # tcpdump -ni any host 194.5.220.43
> tcpdump: data link type LINUX_SLL2
> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
> 08:18:05.951066 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [S], seq 1556080700, win 64240, options [mss 1460,sackOK,TS val 765965336 ecr 0,nop,wscale 7], length 0
> 08:18:05.951106 net2  Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [S.], seq 3465881361, ack 1556080701, win 65160, options [mss 1460,sackOK,TS val 3168643538 ecr 765965336,nop,wscale 7], length 0
> 08:18:05.958699 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 765965342 ecr 3168643538], length 0
> 08:18:05.958749 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [P.], seq 1:79, ack 1, win 502, options [nop,nop,TS val 765965342 ecr 3168643538], length 78: HTTP: GET / HTTP/1.1
> 08:18:05.958763 net2  Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [.], ack 79, win 509, options [nop,nop,TS val 3168643545 ecr 765965342], length 0
> 08:18:05.959216 net2  Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [P.], seq 1:239, ack 79, win 509, options [nop,nop,TS val 3168643546 ecr 765965342], length 238: HTTP: HTTP/1.1 200 OK
> 08:18:05.959327 net2  Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [P.], seq 239:854, ack 79, win 509, options [nop,nop,TS val 3168643546 ecr 765965342], length 615: HTTP
> 08:18:05.965244 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 239, win 501, options [nop,nop,TS val 765965350 ecr 3168643546], length 0
> 08:18:05.965348 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 854, win 497, options [nop,nop,TS val 765965350 ecr 3168643546], length 0
> 08:18:05.965487 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [F.], seq 79, ack 854, win 501, options [nop,nop,TS val 765965350 ecr 3168643546], length 0
> 08:18:05.965573 net2  Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [F.], seq 854, ack 80, win 509, options [nop,nop,TS val 3168643552 ecr 765965350], length 0
> 08:18:05.971916 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 855, win 501, options [nop,nop,TS val 765965356 ecr 3168643552], length 0
>
>
>
> [0]
> wireguard "server" that changes the source ip:
>
> / # ip a
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>     inet 127.0.0.1/8 scope host lo
>        valid_lft forever preferred_lft forever
>     inet6 ::1/128 scope host
>        valid_lft forever preferred_lft forever
> 3: eth0@if29: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
>     link/ether 66:4a:9c:12:5b:6c brd ff:ff:ff:ff:ff:ff
>     inet6 2a0a:e5c0:10:1e:7f21:83ca:a7d:46d2/128 scope global
>        valid_lft forever preferred_lft forever
>     inet6 fe80::644a:9cff:fe12:5b6c/64 scope link
>        valid_lft forever preferred_lft forever
> 4: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
>     link/ether 3c:ec:ef:cb:d8:1b brd ff:ff:ff:ff:ff:ff
>     inet 147.78.195.254/27 brd 147.78.195.255 scope global net1
>        valid_lft forever preferred_lft forever
>     inet6 2a0a:e5c0:1:8::53/64 scope global
>        valid_lft forever preferred_lft forever
>     inet6 fe80::3eec:efff:fecb:d81b/64 scope link
>        valid_lft forever preferred_lft forever
> 5: v1477819464: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN qlen 1000
>     link/[65534]
>     inet 147.78.194.65/26 scope global v1477819464
>        valid_lft forever preferred_lft forever
>     inet6 2a0a:e5c0:2e::1/64 scope global
>        valid_lft forever preferred_lft forever
> 26: net2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
>     link/ether 3c:ec:ef:cb:d8:1c brd ff:ff:ff:ff:ff:ff
>     inet 195.141.200.73/31 scope global net2
>        valid_lft forever preferred_lft forever
>     inet6 2001:1700:3500:2::12/124 scope global
>        valid_lft forever preferred_lft forever
>     inet6 fe80::3eec:efff:fecb:d81c/64 scope link
>        valid_lft forever preferred_lft forever
> / #
>
> wireguard client behind nat:
>
> nb3:/etc/wireguard# curl -4 ifconfig.io
> 194.5.220.43
> nb3:/etc/wireguard# ip a sh dev wlan0
> 2: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
>     link/ether 84:5c:f3:ed:52:9c brd ff:ff:ff:ff:ff:ff
>     inet 192.168.4.85/24 brd 192.168.4.255 scope global dynamic noprefixroute wlan0
>        valid_lft 317sec preferred_lft 242sec
>     inet6 2a0a:e5c0:13:0:865c:f3ff:feed:529c/64 scope global dynamic mngtmpaddr noprefixroute
>        valid_lft 86394sec preferred_lft 14394sec
>     inet6 fe80::865c:f3ff:feed:529c/64 scope link
>        valid_lft forever preferred_lft forever
> nb3:/etc/wireguard#
>
>
> [1]
> / # ip route get 194.5.220.43
> 194.5.220.43 via 195.141.200.72 dev net2  src 195.141.200.73
> / #
>
>
> Mike O'Connor <mike@pineview.net> writes:
>
>> Generally all OSs will if sending from a local process will use the
>> address of the outgoing interface for the packet.
>>
>> If the packet is forwarded and no NAT is used the address will be
>> routed via the interface suggested by the routing table.
>>
>> So local routing can be a real pain, policy based routing is an
>> option. The other option could be to setup an 'output' NAT to an
>> address which is multi-homed.
>>
>> I have a system running which is multi-homed with out issue other than
>> the actual routing machine. This machine is BGP connected to three
>> locations.
>>
>> There is no NAT setup and because I also add the wireguard link
>> addresses to the BGP sessions.
>>
>> Cheers
>>
>>
>>
>> On 19/2/2023 6:44 am, Nico Schottelius wrote:
>>> Dear group,
>>>
>>> I was wondering how wireguard [Linux kernel] or wireguard-go [FreeBSD]
>>> are supposed to decide which IP address to use for replying?
>>>
>>> I have seen both on FreeBSD and Linux that wireguard seems to use the IP
>>> address of the outgoing interface, i.e. the one with the route returning
>>> to the sender. However in multi homed situations, this can be wrong,
>>> let's take this example:
>>>
>>>        19:57:24.607526 net1  In  IP 194.5.220.43.60770 > 147.78.195.254.51820: UDP, length 148
>>>        19:57:24.608358 net2  Out IP 195.141.200.73.51820 > 194.5.220.43.60770: UDP, length 92
>>>
>>> The initiator sends from 194.5.220.43 to the receiver 147.78.195.254.
>>> Wireguard then replies with the source IP of 195.141.200.73 instead of
>>> 147.78.195.254.
>>>
>>> As the node is multi homed, the packet might leave through any of its
>>> uplinks and thus return with a random (unexpected) IP address and will
>>> not pass NAT rules on firewalls and finally be dropped. F.i. in above
>>> example the firewall drops the packet from 195.141.200.73, because there
>>> is no session entry for that.
>>>
>>> I have observed this behaviour both on Linux 6.1.11 as well as
>>> wireguard-go 0.0.20220316_8,1 on FreeBSD and in both cases the
>>> connection will break depending on which active interface is taken as
>>> exit.
>>>
>>> I would argue that wireguard should by default invert the IP
>>> addresses, i.e. switch dst=src, src=dst and then reply with that,
>>> instead of adapting an interface specific address, or is there a good
>>> reason for the current behaviour?
>>>
>>> Best regards,
>>>
>>> Nico
>>>
>>> --
>>> Sustainable and modern Infrastructures by ungleich.ch


--
Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
       [not found]     ` <2ed829aaed9fec59ac2a9b32c4ce0a9005b8d8b850be81c81a226791855fe4eb@mu.id>
@ 2023-02-19 12:13       ` Nico Schottelius
  2023-02-19 14:39         ` Christoph Loesch
  0 siblings, 1 reply; 35+ messages in thread
From: Nico Schottelius @ 2023-02-19 12:13 UTC (permalink / raw)
  To: Sebastian Hyrwall
  Cc: Nico Schottelius, Mike O'Connor, WireGuard mailing list


Hey Sebastian,

Sebastian Hyrwall <sh@keff.org> writes:

> It is kinda. It's been mentioned multiple times over the years but no one seems to want to fix it. Atleast you should be able to specify bind/src ip in the
> config. I gave up WG because of it. Wasn't accepted by my projects security policy since src ip could not be configured.
>
> There is an unofficial patch however,
>
> https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281

the binding is somewhat related to this issue and I was looking for that
feature some time ago, too. While it is correlated and I would really
appreciate binding support, I am not sure whether the linked patch does
actually fix the problem I am seeing in multi homed devices.

As long as wireguard does not reply with the same IP address it was
contacted with, packets will get dropped on stateful firewalls, because
the returning packet does not match the state session database.

Best regards,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 12:13       ` Nico Schottelius
@ 2023-02-19 14:39         ` Christoph Loesch
  2023-02-19 16:32           ` David Kerr
  2023-02-19 20:02           ` Nico Schottelius
  0 siblings, 2 replies; 35+ messages in thread
From: Christoph Loesch @ 2023-02-19 14:39 UTC (permalink / raw)
  To: wireguard

Hi,

I don't think no one wants to fix it, there are several users having this issue. I rather guess no one could find a suitable solution to fix it.

@Nico: did you try to delete the affected route and add it again with the correct source IP ?

as I mentioned it in https://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html

ip route del <NET>
ip route add <NET> dev <ALIAS_DEV> src <SRC_IP>

This way I was able to (at least temporary) fix this issue on multi homed systems.

Kind regards,
Christoph

Am 19.02.2023 um 13:13 schrieb Nico Schottelius:
> Hey Sebastian,
>
> Sebastian Hyrwall <sh@keff.org> writes:
>
>> It is kinda. It's been mentioned multiple times over the years but no one seems to want to fix it. Atleast you should be able to specify bind/src ip in the
>> config. I gave up WG because of it. Wasn't accepted by my projects security policy since src ip could not be configured.
>>
>> There is an unofficial patch however,
>>
>> https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281
> the binding is somewhat related to this issue and I was looking for that
> feature some time ago, too. While it is correlated and I would really
> appreciate binding support, I am not sure whether the linked patch does
> actually fix the problem I am seeing in multi homed devices.
>
> As long as wireguard does not reply with the same IP address it was
> contacted with, packets will get dropped on stateful firewalls, because
> the returning packet does not match the state session database.
>
> Best regards,
>
> Nico
>
> --
> Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 14:39         ` Christoph Loesch
@ 2023-02-19 16:32           ` David Kerr
  2023-02-19 16:54             ` Sebastian Hyrvall
  2023-02-19 17:05             ` tlhackque
  2023-02-19 20:02           ` Nico Schottelius
  1 sibling, 2 replies; 35+ messages in thread
From: David Kerr @ 2023-02-19 16:32 UTC (permalink / raw)
  To: wireguard

Without getting into the debate of whether wireguard is acting
correctly or not, I think there is a possible workaround.

1. In the iptables mangle table PREROUTING, match the incoming
interface and destination address and --set-xmark a firewall MARK
unique to this interface/destination
2. Create a new ip route table that sets the default route to go out
on the interface with the source address you want (same as destination
address in iptables)
3. Create a new ip rule that sends all packets with firewall mark set
in iptables to the routing table you just created

Repeat above for each interface/address you need to mangle, with a
unique firewall mark and routing table for each.

It may be necessary to use CONNMARK in PREROUTING and OUTPUT to
--restore_mark.  I can't remember if this is needed or not, its been a
while since I configured iptables with this.

This should ensure that any packet that comes into an
interface/address is replied to from the same interface/address.

David


On Sun, Feb 19, 2023 at 9:44 AM Christoph Loesch <wireguard-mail@chil.at> wrote:
>
> Hi,
>
> I don't think no one wants to fix it, there are several users having this issue. I rather guess no one could find a suitable solution to fix it.
>
> @Nico: did you try to delete the affected route and add it again with the correct source IP ?
>
> as I mentioned it in https://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html
>
> ip route del <NET>
> ip route add <NET> dev <ALIAS_DEV> src <SRC_IP>
>
> This way I was able to (at least temporary) fix this issue on multi homed systems.
>
> Kind regards,
> Christoph
>
> Am 19.02.2023 um 13:13 schrieb Nico Schottelius:
> > Hey Sebastian,
> >
> > Sebastian Hyrwall <sh@keff.org> writes:
> >
> >> It is kinda. It's been mentioned multiple times over the years but no one seems to want to fix it. Atleast you should be able to specify bind/src ip in the
> >> config. I gave up WG because of it. Wasn't accepted by my projects security policy since src ip could not be configured.
> >>
> >> There is an unofficial patch however,
> >>
> >> https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281
> > the binding is somewhat related to this issue and I was looking for that
> > feature some time ago, too. While it is correlated and I would really
> > appreciate binding support, I am not sure whether the linked patch does
> > actually fix the problem I am seeing in multi homed devices.
> >
> > As long as wireguard does not reply with the same IP address it was
> > contacted with, packets will get dropped on stateful firewalls, because
> > the returning packet does not match the state session database.
> >
> > Best regards,
> >
> > Nico
> >
> > --
> > Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 16:32           ` David Kerr
@ 2023-02-19 16:54             ` Sebastian Hyrvall
  2023-02-19 18:04               ` Janne Johansson
  2023-02-19 17:05             ` tlhackque
  1 sibling, 1 reply; 35+ messages in thread
From: Sebastian Hyrvall @ 2023-02-19 16:54 UTC (permalink / raw)
  To: wireguard

You should get into that debate. Proposing firewall workarounds is not a 
correct solution so please don't do it. It needs to be fixed. It's an 
immature VPN solution that always just proposed a workaround instead of 
fixing the problem. It seems to be designed by people that are good at 
software and cryptography but has no clue about networking stacks.

On 2023-02-19 23:32, David Kerr wrote:
> Without getting into the debate of whether wireguard is acting
> correctly or not, I think there is a possible workaround.
>
> 1. In the iptables mangle table PREROUTING, match the incoming
> interface and destination address and --set-xmark a firewall MARK
> unique to this interface/destination
> 2. Create a new ip route table that sets the default route to go out
> on the interface with the source address you want (same as destination
> address in iptables)
> 3. Create a new ip rule that sends all packets with firewall mark set
> in iptables to the routing table you just created
>
> Repeat above for each interface/address you need to mangle, with a
> unique firewall mark and routing table for each.
>
> It may be necessary to use CONNMARK in PREROUTING and OUTPUT to
> --restore_mark.  I can't remember if this is needed or not, its been a
> while since I configured iptables with this.
>
> This should ensure that any packet that comes into an
> interface/address is replied to from the same interface/address.
>
> David
>
>
> On Sun, Feb 19, 2023 at 9:44 AM Christoph Loesch <wireguard-mail@chil.at> wrote:
>> Hi,
>>
>> I don't think no one wants to fix it, there are several users having this issue. I rather guess no one could find a suitable solution to fix it.
>>
>> @Nico: did you try to delete the affected route and add it again with the correct source IP ?
>>
>> as I mentioned it in https://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html
>>
>> ip route del <NET>
>> ip route add <NET> dev <ALIAS_DEV> src <SRC_IP>
>>
>> This way I was able to (at least temporary) fix this issue on multi homed systems.
>>
>> Kind regards,
>> Christoph
>>
>> Am 19.02.2023 um 13:13 schrieb Nico Schottelius:
>>> Hey Sebastian,
>>>
>>> Sebastian Hyrwall <sh@keff.org> writes:
>>>
>>>> It is kinda. It's been mentioned multiple times over the years but no one seems to want to fix it. Atleast you should be able to specify bind/src ip in the
>>>> config. I gave up WG because of it. Wasn't accepted by my projects security policy since src ip could not be configured.
>>>>
>>>> There is an unofficial patch however,
>>>>
>>>> https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281
>>> the binding is somewhat related to this issue and I was looking for that
>>> feature some time ago, too. While it is correlated and I would really
>>> appreciate binding support, I am not sure whether the linked patch does
>>> actually fix the problem I am seeing in multi homed devices.
>>>
>>> As long as wireguard does not reply with the same IP address it was
>>> contacted with, packets will get dropped on stateful firewalls, because
>>> the returning packet does not match the state session database.
>>>
>>> Best regards,
>>>
>>> Nico
>>>
>>> --
>>> Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 16:32           ` David Kerr
  2023-02-19 16:54             ` Sebastian Hyrvall
@ 2023-02-19 17:05             ` tlhackque
  2023-02-19 18:37               ` David Kerr
                                 ` (2 more replies)
  1 sibling, 3 replies; 35+ messages in thread
From: tlhackque @ 2023-02-19 17:05 UTC (permalink / raw)
  To: wireguard


[-- Attachment #1.1: Type: text/plain, Size: 3840 bytes --]

FWIW, while clever, I don't think that iptables mark solves all cases.  
E.g., consider an interface with multiple addresses, where a packet 
comes in on a secondary address.  The proposed rule would send it out 
the right interface, but still with the wrong (primary) address picked 
from the interface...

With IPv6 it's common to assign an address to a service rather than a 
host so services can move easily.  So multiple addresses per interface 
are the rule, not the exception.

I do the same with IPv4 inside addresses, though these days public IPv4 
addresses are scarce enough that it's not common for public IPs.  It 
amounts to the same issue - the NAT tracking is stateful.

Trying to work around this with routing seems like a maze of twisty 
passages - so I agree that the right solution is for WG to respond from 
the address that receives a packet.

On 19-Feb-23 11:32, David Kerr wrote:
> Without getting into the debate of whether wireguard is acting
> correctly or not, I think there is a possible workaround.
>
> 1. In the iptables mangle table PREROUTING, match the incoming
> interface and destination address and --set-xmark a firewall MARK
> unique to this interface/destination
> 2. Create a new ip route table that sets the default route to go out
> on the interface with the source address you want (same as destination
> address in iptables)
> 3. Create a new ip rule that sends all packets with firewall mark set
> in iptables to the routing table you just created
>
> Repeat above for each interface/address you need to mangle, with a
> unique firewall mark and routing table for each.
>
> It may be necessary to use CONNMARK in PREROUTING and OUTPUT to
> --restore_mark.  I can't remember if this is needed or not, its been a
> while since I configured iptables with this.
>
> This should ensure that any packet that comes into an
> interface/address is replied to from the same interface/address.
>
> David
>
>
> On Sun, Feb 19, 2023 at 9:44 AM Christoph Loesch<wireguard-mail@chil.at>  wrote:
>> Hi,
>>
>> I don't think no one wants to fix it, there are several users having this issue. I rather guess no one could find a suitable solution to fix it.
>>
>> @Nico: did you try to delete the affected route and add it again with the correct source IP ?
>>
>> as I mentioned it inhttps://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html
>>
>> ip route del <NET>
>> ip route add <NET> dev <ALIAS_DEV> src <SRC_IP>
>>
>> This way I was able to (at least temporary) fix this issue on multi homed systems.
>>
>> Kind regards,
>> Christoph
>>
>> Am 19.02.2023 um 13:13 schrieb Nico Schottelius:
>>> Hey Sebastian,
>>>
>>> Sebastian Hyrwall<sh@keff.org>  writes:
>>>
>>>> It is kinda. It's been mentioned multiple times over the years but no one seems to want to fix it. Atleast you should be able to specify bind/src ip in the
>>>> config. I gave up WG because of it. Wasn't accepted by my projects security policy since src ip could not be configured.
>>>>
>>>> There is an unofficial patch however,
>>>>
>>>> https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281
>>> the binding is somewhat related to this issue and I was looking for that
>>> feature some time ago, too. While it is correlated and I would really
>>> appreciate binding support, I am not sure whether the linked patch does
>>> actually fix the problem I am seeing in multi homed devices.
>>>
>>> As long as wireguard does not reply with the same IP address it was
>>> contacted with, packets will get dropped on stateful firewalls, because
>>> the returning packet does not match the state session database.
>>>
>>> Best regards,
>>>
>>> Nico
>>>
>>> --
>>> Sustainable and modern Infrastructures by ungleich.ch


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 16:54             ` Sebastian Hyrvall
@ 2023-02-19 18:04               ` Janne Johansson
  2023-02-19 18:08                 ` Sebastian Hyrvall
  2023-02-19 20:11                 ` Nico Schottelius
  0 siblings, 2 replies; 35+ messages in thread
From: Janne Johansson @ 2023-02-19 18:04 UTC (permalink / raw)
  To: Sebastian Hyrvall; +Cc: wireguard

Den sön 19 feb. 2023 kl 18:06 skrev Sebastian Hyrvall <sh@keff.org>:
>
> You should get into that debate. Proposing firewall workarounds is not a
> correct solution so please don't do it. It needs to be fixed. It's an
> immature VPN solution that always just proposed a workaround instead of
> fixing the problem.

I would make sure that you are not mis-ascribing the problem* to "an
immature VPN" and not what the default UDP behaviour of the kernel is,
to pick a working interface to send packets from based on the routing
table, in which any/all udp based tunnel would suffer the same
problem. If you google it, you may find that other udp transports face
the same "problem".

*) https://en.wiktionary.org/wiki/Chesterton%27s_fence

-- 
May the most significant bit of your life be positive.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 18:04               ` Janne Johansson
@ 2023-02-19 18:08                 ` Sebastian Hyrvall
  2023-02-19 20:11                 ` Nico Schottelius
  1 sibling, 0 replies; 35+ messages in thread
From: Sebastian Hyrvall @ 2023-02-19 18:08 UTC (permalink / raw)
  To: Janne Johansson; +Cc: wireguard

It is the default behavior of the kernel. But all networking software 
dealing in security knows how to correctly behave. You are welcome to 
inform me of something else suffering the same problem.

On 2023-02-20 01:04, Janne Johansson wrote:
> Den sön 19 feb. 2023 kl 18:06 skrev Sebastian Hyrvall <sh@keff.org>:
>> You should get into that debate. Proposing firewall workarounds is not a
>> correct solution so please don't do it. It needs to be fixed. It's an
>> immature VPN solution that always just proposed a workaround instead of
>> fixing the problem.
> I would make sure that you are not mis-ascribing the problem* to "an
> immature VPN" and not what the default UDP behaviour of the kernel is,
> to pick a working interface to send packets from based on the routing
> table, in which any/all udp based tunnel would suffer the same
> problem. If you google it, you may find that other udp transports face
> the same "problem".
>
> *) https://en.wiktionary.org/wiki/Chesterton%27s_fence
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Fwd: Source IP incorrect on multi homed systems
       [not found]               ` <CADGd2DoE6TCtCxxWL7JWyNW5+yy_Pe+9MNzHznbudMWLTXQreA@mail.gmail.com>
@ 2023-02-19 18:30                 ` John Lauro
  2023-02-19 22:28                 ` tlhackque
  1 sibling, 0 replies; 35+ messages in thread
From: John Lauro @ 2023-02-19 18:30 UTC (permalink / raw)
  To: WireGuard mailing list

I think the ip route with src would work, but only as a short lived
work around.  The problem with it is if dealing with dynamic routes is
it could go a way when a link is down and then come back and the src
setting would be lost.  You would need the bgp software to add the
src.

UDP is connectionless.  Sending back out the same as it's coming in
isn't strictly the same.  The streams are not attached the same as
they would be with TCP on nginx or a reply with icmp. You should be
able to whitelist the udp port on the NAT devices, as it shouldn't use
state info.

I am not sure if you are attempting to do site to site or client to
server/site and which end has the NAT (or both).  What I do for site
to site is use a different port for each connection and have a
separate BGP connection for each possible connection (ie: different
one for different network providers).  Have a full mesh with 8 sites
and upto 3 providers per site.

That said, you probably have floating IPs on the client side, and
don't want to lock in a single IP on the multi-homed server side?  You
could nat the incoming IPs on the border from an internal IP and then
then lock to a single private IP on the wireguard server for in/out
and that border nat would force the reply back to the same gateway it
came in from.

I know, you don't want work arounds, just want to mention it's not the
same as comparing a single stream to something that handles routing
though it.  As you are doing bgp and redundant routes I assume you
also reset rp_filter on all nat/wireguard/routers so the routers will
allow packets to come from different sources.

On Sun, Feb 19, 2023 at 12:07 PM tlhackque <tlhackque@yahoo.com> wrote:
>
> FWIW, while clever, I don't think that iptables mark solves all cases.
> E.g., consider an interface with multiple addresses, where a packet
> comes in on a secondary address.  The proposed rule would send it out
> the right interface, but still with the wrong (primary) address picked
> from the interface...
>
> With IPv6 it's common to assign an address to a service rather than a
> host so services can move easily.  So multiple addresses per interface
> are the rule, not the exception.
>
> I do the same with IPv4 inside addresses, though these days public IPv4
> addresses are scarce enough that it's not common for public IPs.  It
> amounts to the same issue - the NAT tracking is stateful.
>
> Trying to work around this with routing seems like a maze of twisty
> passages - so I agree that the right solution is for WG to respond from
> the address that receives a packet.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 17:05             ` tlhackque
@ 2023-02-19 18:37               ` David Kerr
  2023-02-19 18:52                 ` tlhackque
  2023-02-19 18:42               ` tlhackque
       [not found]               ` <CADGd2DoE6TCtCxxWL7JWyNW5+yy_Pe+9MNzHznbudMWLTXQreA@mail.gmail.com>
  2 siblings, 1 reply; 35+ messages in thread
From: David Kerr @ 2023-02-19 18:37 UTC (permalink / raw)
  To: wireguard

My proposed workaround specifically stated to match on both the
interface and destination address, and to set a route with both
interface and [source] address.  This allows for multiple IP addresses
on the same interface -- which you can do with both IPv4 and IPv6.

But yes, it is a nasty hack.  You really need to understand what is
going on between the firewall and routing tables/rules and it is easy
to get confused.


On Sun, Feb 19, 2023 at 12:10 PM tlhackque <tlhackque@yahoo.com> wrote:
>
> FWIW, while clever, I don't think that iptables mark solves all cases.
> E.g., consider an interface with multiple addresses, where a packet
> comes in on a secondary address.  The proposed rule would send it out
> the right interface, but still with the wrong (primary) address picked
> from the interface...
>
> With IPv6 it's common to assign an address to a service rather than a
> host so services can move easily.  So multiple addresses per interface
> are the rule, not the exception.
>
> I do the same with IPv4 inside addresses, though these days public IPv4
> addresses are scarce enough that it's not common for public IPs.  It
> amounts to the same issue - the NAT tracking is stateful.
>
> Trying to work around this with routing seems like a maze of twisty
> passages - so I agree that the right solution is for WG to respond from
> the address that receives a packet.
>
> On 19-Feb-23 11:32, David Kerr wrote:
> > Without getting into the debate of whether wireguard is acting
> > correctly or not, I think there is a possible workaround.
> >
> > 1. In the iptables mangle table PREROUTING, match the incoming
> > interface and destination address and --set-xmark a firewall MARK
> > unique to this interface/destination
> > 2. Create a new ip route table that sets the default route to go out
> > on the interface with the source address you want (same as destination
> > address in iptables)
> > 3. Create a new ip rule that sends all packets with firewall mark set
> > in iptables to the routing table you just created
> >
> > Repeat above for each interface/address you need to mangle, with a
> > unique firewall mark and routing table for each.
> >
> > It may be necessary to use CONNMARK in PREROUTING and OUTPUT to
> > --restore_mark.  I can't remember if this is needed or not, its been a
> > while since I configured iptables with this.
> >
> > This should ensure that any packet that comes into an
> > interface/address is replied to from the same interface/address.
> >
> > David
> >
> >
> > On Sun, Feb 19, 2023 at 9:44 AM Christoph Loesch<wireguard-mail@chil.at>  wrote:
> >> Hi,
> >>
> >> I don't think no one wants to fix it, there are several users having this issue. I rather guess no one could find a suitable solution to fix it.
> >>
> >> @Nico: did you try to delete the affected route and add it again with the correct source IP ?
> >>
> >> as I mentioned it inhttps://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html
> >>
> >> ip route del <NET>
> >> ip route add <NET> dev <ALIAS_DEV> src <SRC_IP>
> >>
> >> This way I was able to (at least temporary) fix this issue on multi homed systems.
> >>
> >> Kind regards,
> >> Christoph
> >>
> >> Am 19.02.2023 um 13:13 schrieb Nico Schottelius:
> >>> Hey Sebastian,
> >>>
> >>> Sebastian Hyrwall<sh@keff.org>  writes:
> >>>
> >>>> It is kinda. It's been mentioned multiple times over the years but no one seems to want to fix it. Atleast you should be able to specify bind/src ip in the
> >>>> config. I gave up WG because of it. Wasn't accepted by my projects security policy since src ip could not be configured.
> >>>>
> >>>> There is an unofficial patch however,
> >>>>
> >>>> https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281
> >>> the binding is somewhat related to this issue and I was looking for that
> >>> feature some time ago, too. While it is correlated and I would really
> >>> appreciate binding support, I am not sure whether the linked patch does
> >>> actually fix the problem I am seeing in multi homed devices.
> >>>
> >>> As long as wireguard does not reply with the same IP address it was
> >>> contacted with, packets will get dropped on stateful firewalls, because
> >>> the returning packet does not match the state session database.
> >>>
> >>> Best regards,
> >>>
> >>> Nico
> >>>
> >>> --
> >>> Sustainable and modern Infrastructures by ungleich.ch
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 17:05             ` tlhackque
  2023-02-19 18:37               ` David Kerr
@ 2023-02-19 18:42               ` tlhackque
  2023-02-19 20:18                 ` Nico Schottelius
       [not found]               ` <CADGd2DoE6TCtCxxWL7JWyNW5+yy_Pe+9MNzHznbudMWLTXQreA@mail.gmail.com>
  2 siblings, 1 reply; 35+ messages in thread
From: tlhackque @ 2023-02-19 18:42 UTC (permalink / raw)
  To: wireguard


[-- Attachment #1.1: Type: text/plain, Size: 6474 bytes --]

BTW, DNS is a common UDP (well, mostly) protocol that encountered the 
same issue.
See RFC 2181 <https://www.rfc-editor.org/rfc/rfc2181.html> (1997), where 
you'll find (emphasis added):
> 4 <https://www.rfc-editor.org/rfc/rfc2181.html#section-4>. Server 
> Reply Source Address Selection
>
>     Most, if not all, DNS clients, expect the address from which a reply
>     is received to be the same address as that to which the query
>     eliciting the reply was sent.  This is true for servers acting as
>     clients for the purposes of recursive query resolution, as well as
>     simple resolver clients.  The address, along with the identifier (ID)
>     in the reply is used for disambiguating replies, and filtering
>     spurious responses.  This may, or may not, have been intended when
>     the DNS was designed, but is now a fact of life.
>
>     Some multi-homed hosts running DNS servers generate a reply using a
>     source address that is not the same as the destination address from
>     the client's request packet.

> _**Such replies will be discarded by the client because the source 
> address of the reply does not match that of a host to which the client 
> sent the original request.** _  That is, it
>     appears to be an unsolicited response.
>
> 4.1 <https://www.rfc-editor.org/rfc/rfc2181.html#section-4.1>. UDP 
> Source Address Selection
>
>     ***To avoid these problems, servers when responding to queries using 
> UDP _must _cause the reply to be sent with the source address field in 
> the IP header set to the address that was in the destination address 
> field of the IP header of the packet containing the query causing the 
> response.** *  

> If this would cause the response to be sent from an IP
>     address that is not permitted for this purpose, then the response may
>     be sent from any legal IP address allocated to the server.  That
>     address should be chosen to maximise the possibility that the client
>     will be able to use it for further queries.  Servers configured in
>     such a way that not all their addresses are equally reachable from
>     all potential clients need take particular care when responding to
>     queries sent to anycast, multicast, or similar, addresses.
>


On 19-Feb-23 12:05, tlhackque wrote:
> FWIW, while clever, I don't think that iptables mark solves all cases. 
> E.g., consider an interface with multiple addresses, where a packet 
> comes in on a secondary address.  The proposed rule would send it out 
> the right interface, but still with the wrong (primary) address picked 
> from the interface...
>
> With IPv6 it's common to assign an address to a service rather than a 
> host so services can move easily.  So multiple addresses per interface 
> are the rule, not the exception.
>
> I do the same with IPv4 inside addresses, though these days public 
> IPv4 addresses are scarce enough that it's not common for public IPs.  
> It amounts to the same issue - the NAT tracking is stateful.
>
> Trying to work around this with routing seems like a maze of twisty 
> passages - so I agree that the right solution is for WG to respond 
> from the address that receives a packet.
>
> On 19-Feb-23 11:32, David Kerr wrote:
>> Without getting into the debate of whether wireguard is acting
>> correctly or not, I think there is a possible workaround.
>>
>> 1. In the iptables mangle table PREROUTING, match the incoming
>> interface and destination address and --set-xmark a firewall MARK
>> unique to this interface/destination
>> 2. Create a new ip route table that sets the default route to go out
>> on the interface with the source address you want (same as destination
>> address in iptables)
>> 3. Create a new ip rule that sends all packets with firewall mark set
>> in iptables to the routing table you just created
>>
>> Repeat above for each interface/address you need to mangle, with a
>> unique firewall mark and routing table for each.
>>
>> It may be necessary to use CONNMARK in PREROUTING and OUTPUT to
>> --restore_mark.  I can't remember if this is needed or not, its been a
>> while since I configured iptables with this.
>>
>> This should ensure that any packet that comes into an
>> interface/address is replied to from the same interface/address.
>>
>> David
>>
>>
>> On Sun, Feb 19, 2023 at 9:44 AM Christoph 
>> Loesch<wireguard-mail@chil.at>  wrote:
>>> Hi,
>>>
>>> I don't think no one wants to fix it, there are several users having 
>>> this issue. I rather guess no one could find a suitable solution to 
>>> fix it.
>>>
>>> @Nico: did you try to delete the affected route and add it again 
>>> with the correct source IP ?
>>>
>>> as I mentioned it 
>>> inhttps://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html
>>>
>>> ip route del <NET>
>>> ip route add <NET> dev <ALIAS_DEV> src <SRC_IP>
>>>
>>> This way I was able to (at least temporary) fix this issue on multi 
>>> homed systems.
>>>
>>> Kind regards,
>>> Christoph
>>>
>>> Am 19.02.2023 um 13:13 schrieb Nico Schottelius:
>>>> Hey Sebastian,
>>>>
>>>> Sebastian Hyrwall<sh@keff.org>  writes:
>>>>
>>>>> It is kinda. It's been mentioned multiple times over the years but 
>>>>> no one seems to want to fix it. Atleast you should be able to 
>>>>> specify bind/src ip in the
>>>>> config. I gave up WG because of it. Wasn't accepted by my projects 
>>>>> security policy since src ip could not be configured.
>>>>>
>>>>> There is an unofficial patch however,
>>>>>
>>>>> https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281 
>>>>>
>>>> the binding is somewhat related to this issue and I was looking for 
>>>> that
>>>> feature some time ago, too. While it is correlated and I would really
>>>> appreciate binding support, I am not sure whether the linked patch 
>>>> does
>>>> actually fix the problem I am seeing in multi homed devices.
>>>>
>>>> As long as wireguard does not reply with the same IP address it was
>>>> contacted with, packets will get dropped on stateful firewalls, 
>>>> because
>>>> the returning packet does not match the state session database.
>>>>
>>>> Best regards,
>>>>
>>>> Nico
>>>>
>>>> -- 
>>>> Sustainable and modern Infrastructures by ungleich.ch
>

-- 
This communication may not represent my employer's views,
if any, on the matters discussed.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 18:37               ` David Kerr
@ 2023-02-19 18:52                 ` tlhackque
  0 siblings, 0 replies; 35+ messages in thread
From: tlhackque @ 2023-02-19 18:52 UTC (permalink / raw)
  To: wireguard


[-- Attachment #1.1: Type: text/plain, Size: 5226 bytes --]

On 19-Feb-23 13:37, David Kerr wrote:
> My proposed workaround specifically stated to match on both the
> interface and destination address, and to set a route with both
> interface and [source] address.  This allows for multiple IP addresses
> on the same interface -- which you can do with both IPv4 and IPv6.

Fair enough.  Of course, that means having a unique rule and mark for 
each if/destination address, which you now have to manage - and avoid 
conflicts with all other uses of mark.  One of which is wg-quick...

"manage" includes remembering to add/remove the rule and 
allocate/deallocate the mark synchronously with wg-enabled IP addresses 
- and if wg is listening on all addresses, that means every ip address.

You can get there, but as I said, it's a maze of twisty passages and the 
complications of managing it pile up.


> But yes, it is a nasty hack.  You really need to understand what is
> going on between the firewall and routing tables/rules and it is easy
> to get confused.
>
>
> On Sun, Feb 19, 2023 at 12:10 PM tlhackque<tlhackque@yahoo.com>  wrote:
>> FWIW, while clever, I don't think that iptables mark solves all cases.
>> E.g., consider an interface with multiple addresses, where a packet
>> comes in on a secondary address.  The proposed rule would send it out
>> the right interface, but still with the wrong (primary) address picked
>> from the interface...
>>
>> With IPv6 it's common to assign an address to a service rather than a
>> host so services can move easily.  So multiple addresses per interface
>> are the rule, not the exception.
>>
>> I do the same with IPv4 inside addresses, though these days public IPv4
>> addresses are scarce enough that it's not common for public IPs.  It
>> amounts to the same issue - the NAT tracking is stateful.
>>
>> Trying to work around this with routing seems like a maze of twisty
>> passages - so I agree that the right solution is for WG to respond from
>> the address that receives a packet.
>>
>> On 19-Feb-23 11:32, David Kerr wrote:
>>> Without getting into the debate of whether wireguard is acting
>>> correctly or not, I think there is a possible workaround.
>>>
>>> 1. In the iptables mangle table PREROUTING, match the incoming
>>> interface and destination address and --set-xmark a firewall MARK
>>> unique to this interface/destination
>>> 2. Create a new ip route table that sets the default route to go out
>>> on the interface with the source address you want (same as destination
>>> address in iptables)
>>> 3. Create a new ip rule that sends all packets with firewall mark set
>>> in iptables to the routing table you just created
>>>
>>> Repeat above for each interface/address you need to mangle, with a
>>> unique firewall mark and routing table for each.
>>>
>>> It may be necessary to use CONNMARK in PREROUTING and OUTPUT to
>>> --restore_mark.  I can't remember if this is needed or not, its been a
>>> while since I configured iptables with this.
>>>
>>> This should ensure that any packet that comes into an
>>> interface/address is replied to from the same interface/address.
>>>
>>> David
>>>
>>>
>>> On Sun, Feb 19, 2023 at 9:44 AM Christoph Loesch<wireguard-mail@chil.at>   wrote:
>>>> Hi,
>>>>
>>>> I don't think no one wants to fix it, there are several users having this issue. I rather guess no one could find a suitable solution to fix it.
>>>>
>>>> @Nico: did you try to delete the affected route and add it again with the correct source IP ?
>>>>
>>>> as I mentioned it inhttps://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html
>>>>
>>>> ip route del <NET>
>>>> ip route add <NET> dev <ALIAS_DEV> src <SRC_IP>
>>>>
>>>> This way I was able to (at least temporary) fix this issue on multi homed systems.
>>>>
>>>> Kind regards,
>>>> Christoph
>>>>
>>>> Am 19.02.2023 um 13:13 schrieb Nico Schottelius:
>>>>> Hey Sebastian,
>>>>>
>>>>> Sebastian Hyrwall<sh@keff.org>   writes:
>>>>>
>>>>>> It is kinda. It's been mentioned multiple times over the years but no one seems to want to fix it. Atleast you should be able to specify bind/src ip in the
>>>>>> config. I gave up WG because of it. Wasn't accepted by my projects security policy since src ip could not be configured.
>>>>>>
>>>>>> There is an unofficial patch however,
>>>>>>
>>>>>> https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281
>>>>> the binding is somewhat related to this issue and I was looking for that
>>>>> feature some time ago, too. While it is correlated and I would really
>>>>> appreciate binding support, I am not sure whether the linked patch does
>>>>> actually fix the problem I am seeing in multi homed devices.
>>>>>
>>>>> As long as wireguard does not reply with the same IP address it was
>>>>> contacted with, packets will get dropped on stateful firewalls, because
>>>>> the returning packet does not match the state session database.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Nico
>>>>>
>>>>> --
>>>>> Sustainable and modern Infrastructures by ungleich.ch


-- 
This communication may not represent my employer's views,
if any, on the matters discussed.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 12:10     ` Nico Schottelius
@ 2023-02-19 18:59       ` Peter Linder
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Linder @ 2023-02-19 18:59 UTC (permalink / raw)
  To: wireguard

Indeed this is how you typically set up a multihomed service (addresses 
on lo and then announce that using BGP or something).

If you use one of the network links directly for the service and that 
link network goes down (it may not even be in your AS so you may not 
know?) then the service is offline.

use a route-map in your bgp config to set the src address of routes to 
the address on lo, that works for wg :)

/Peter


On 2023-02-19 13:10, Nico Schottelius wrote:
> Aside from nginx + icmp being handled correctly as a reference,
> I want to further elaborate on this case to show that something is
> really wrong with the current behaviour:
>
> A typical scenario for routers is to have a lot of global reachable IP
> addresses (IPv6, IPv4) assigned to the loopback interface, such as this
> system:
>
> [13:11] router2.place6:~# ip a sh dev lo
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
>      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>      inet 127.0.0.1/8 scope host lo
>         valid_lft forever preferred_lft forever
>      inet6 2a0a:e5c0:1e:a::b/128 scope global
>         valid_lft forever preferred_lft forever
>      inet6 2a0a:e5c0:1e:a::a/128 scope global
>         valid_lft forever preferred_lft forever
>      inet6 2a0a:e5c0:2:a::b/128 scope global
>         valid_lft forever preferred_lft forever
>      inet6 2a0a:e5c0:2:a::a/128 scope global
>         valid_lft forever preferred_lft forever
>      inet6 2a0a:e5c0:2:1::7/128 scope global
>         valid_lft forever preferred_lft forever
>      inet6 2a0a:e5c0:2:1::6/128 scope global
>         valid_lft forever preferred_lft forever
>      inet6 2a0a:e5c0:2:1::5/128 scope global
>         valid_lft forever preferred_lft forever
>      inet6 ::1/128 scope host
>         valid_lft forever preferred_lft forever
>
> The motivation behind that is that independent of the actual routing
> interface, these IP addresses are always reachable.
>
> Now in the case of wireguard selecting the source IP based on the
> outgoing interface, this is never going to work, as lo cannot send
> packets to the outside world.
>
>
> Nico Schottelius <nico.schottelius@ungleich.ch> writes:
>
>> Let me rephrase the problem statement:
>>
>>      - ping and http calls to the multi homed machine work correctly:
>>        I can ping 147.78.195.254 and the reply contains the same address.
>>        I can ping 195.141.200.73 and the reply contains the same address.
>>        I can curl 147.78.195.254 and the reply contains the same address.
>>        I can curl 195.141.200.73 and the reply contains the same address.
>>
>>      - wireguard does NOT work because it changes the reply address:
>>        A packet sent to 147.78.195.254 is being replied with 195.141.200.73
>>
>> In general, processes reply with the IP address that was used to contact
>> them and not with the outgoing interface address, which would also break
>> adding IP addresses to the loopback interface.
>>
>> For full detail, see ip addresses [0] and routing below [1] and tests
>> executed [2].
>>
>> I believe that this is a bug in wireguard.
>>
>> --------------------------------------------------------------------------------
>>
>> [2]
>>
>> Let's see how it looks like in detail:
>>
>> 1) ping to 147.78.195.254: works
>>
>> [9:14] nb3:~% ping -c2 147.78.195.254
>> PING 147.78.195.254 (147.78.195.254) 56(84) bytes of data.
>> 64 bytes from 147.78.195.254: icmp_seq=1 ttl=53 time=7.27 ms
>> 64 bytes from 147.78.195.254: icmp_seq=2 ttl=53 time=6.30 ms
>>
>> --- 147.78.195.254 ping statistics ---
>> 2 packets transmitted, 2 received, 0% packet loss, time 1002ms
>> rtt min/avg/max/mdev = 6.296/6.781/7.267/0.485 ms
>>
>> / # tcpdump -ni any host 194.5.220.43
>> tcpdump: data link type LINUX_SLL2
>> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
>> listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
>> 08:14:48.379618 net1  In  IP 194.5.220.43 > 147.78.195.254: ICMP echo request, id 89, seq 1, length 64
>> 08:14:48.379651 net2  Out IP 147.78.195.254 > 194.5.220.43: ICMP echo reply, id 89, seq 1, length 64
>> 08:14:49.380340 net1  In  IP 194.5.220.43 > 147.78.195.254: ICMP echo request, id 89, seq 2, length 64
>> 08:14:49.380392 net2  Out IP 147.78.195.254 > 194.5.220.43: ICMP echo reply, id 89, seq 2, length 64
>>
>> 2) ping to 195.141.200.73
>>
>> [9:14] nb3:~% ping -c2 195.141.200.73
>> PING 195.141.200.73 (195.141.200.73) 56(84) bytes of data.
>> 64 bytes from 195.141.200.73: icmp_seq=1 ttl=53 time=11.3 ms
>> 64 bytes from 195.141.200.73: icmp_seq=2 ttl=53 time=6.81 ms
>>
>> --- 195.141.200.73 ping statistics ---
>> 2 packets transmitted, 2 received, 0% packet loss, time 1002ms
>> rtt min/avg/max/mdev = 6.813/9.057/11.301/2.244 ms
>> [9:15] nb3:~%
>> / # tcpdump -ni any host 194.5.220.43
>> tcpdump: data link type LINUX_SLL2
>> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
>> listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
>> 08:16:19.257697 net2  In  IP 194.5.220.43 > 195.141.200.73: ICMP echo request, id 91, seq 1, length 64
>> 08:16:19.257730 net2  Out IP 195.141.200.73 > 194.5.220.43: ICMP echo reply, id 91, seq 1, length 64
>> 08:16:20.250948 net2  In  IP 194.5.220.43 > 195.141.200.73: ICMP echo request, id 91, seq 2, length 64
>> 08:16:20.250980 net2  Out IP 195.141.200.73 > 194.5.220.43: ICMP echo reply, id 91, seq 2, length 64
>>
>> 3) http to 147.78.195.254
>>
>> [9:16] nb3:~% curl -s 147.78.195.254 > /dev/null ; echo $?
>> 0
>> / # tcpdump -ni any host 194.5.220.43
>> tcpdump: data link type LINUX_SLL2
>> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
>> listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
>> 08:17:04.082945 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [S], seq 1405408358, win 64240, options [mss 1460,sackOK,TS val 1380610701 ecr 0,nop,wscale 7], length 0
>> 08:17:04.082983 net2  Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [S.], seq 3790092363, ack 1405408359, win 65160, options [mss 1460,sackOK,TS val 520503591 ecr 1380610701,nop,wscale 7], length 0
>> 08:17:04.089996 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 1380610709 ecr 520503591], length 0
>> 08:17:04.090121 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [P.], seq 1:79, ack 1, win 502, options [nop,nop,TS val 1380610709 ecr 520503591], length 78: HTTP: GET / HTTP/1.1
>> 08:17:04.090136 net2  Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [.], ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 0
>> 08:17:04.090301 net2  Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [P.], seq 1:239, ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 238: HTTP: HTTP/1.1 200 OK
>> 08:17:04.090381 net2  Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [P.], seq 239:854, ack 79, win 509, options [nop,nop,TS val 520503598 ecr 1380610709], length 615: HTTP
>> 08:17:04.096058 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 239, win 501, options [nop,nop,TS val 1380610715 ecr 520503598], length 0
>> 08:17:04.096059 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 854, win 497, options [nop,nop,TS val 1380610715 ecr 520503598], length 0
>> 08:17:04.096339 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [F.], seq 79, ack 854, win 501, options [nop,nop,TS val 1380610715 ecr 520503598], length 0
>> 08:17:04.096450 net2  Out IP 147.78.195.254.80 > 194.5.220.43.39274: Flags [F.], seq 854, ack 80, win 509, options [nop,nop,TS val 520503604 ecr 1380610715], length 0
>> 08:17:04.102609 net1  In  IP 194.5.220.43.39274 > 147.78.195.254.80: Flags [.], ack 855, win 501, options [nop,nop,TS val 1380610721 ecr 520503604], length 0
>>
>>
>> 4) http to 195.141.200.73
>>
>> [9:17] nb3:~% curl -s 195.141.200.73 > /dev/null ; echo $?
>> 0
>>
>> / # tcpdump -ni any host 194.5.220.43
>> tcpdump: data link type LINUX_SLL2
>> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
>> listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
>> 08:18:05.951066 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [S], seq 1556080700, win 64240, options [mss 1460,sackOK,TS val 765965336 ecr 0,nop,wscale 7], length 0
>> 08:18:05.951106 net2  Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [S.], seq 3465881361, ack 1556080701, win 65160, options [mss 1460,sackOK,TS val 3168643538 ecr 765965336,nop,wscale 7], length 0
>> 08:18:05.958699 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 765965342 ecr 3168643538], length 0
>> 08:18:05.958749 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [P.], seq 1:79, ack 1, win 502, options [nop,nop,TS val 765965342 ecr 3168643538], length 78: HTTP: GET / HTTP/1.1
>> 08:18:05.958763 net2  Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [.], ack 79, win 509, options [nop,nop,TS val 3168643545 ecr 765965342], length 0
>> 08:18:05.959216 net2  Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [P.], seq 1:239, ack 79, win 509, options [nop,nop,TS val 3168643546 ecr 765965342], length 238: HTTP: HTTP/1.1 200 OK
>> 08:18:05.959327 net2  Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [P.], seq 239:854, ack 79, win 509, options [nop,nop,TS val 3168643546 ecr 765965342], length 615: HTTP
>> 08:18:05.965244 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 239, win 501, options [nop,nop,TS val 765965350 ecr 3168643546], length 0
>> 08:18:05.965348 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 854, win 497, options [nop,nop,TS val 765965350 ecr 3168643546], length 0
>> 08:18:05.965487 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [F.], seq 79, ack 854, win 501, options [nop,nop,TS val 765965350 ecr 3168643546], length 0
>> 08:18:05.965573 net2  Out IP 195.141.200.73.80 > 194.5.220.43.41484: Flags [F.], seq 854, ack 80, win 509, options [nop,nop,TS val 3168643552 ecr 765965350], length 0
>> 08:18:05.971916 net2  In  IP 194.5.220.43.41484 > 195.141.200.73.80: Flags [.], ack 855, win 501, options [nop,nop,TS val 765965356 ecr 3168643552], length 0
>>
>>
>>
>> [0]
>> wireguard "server" that changes the source ip:
>>
>> / # ip a
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
>>      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>      inet 127.0.0.1/8 scope host lo
>>         valid_lft forever preferred_lft forever
>>      inet6 ::1/128 scope host
>>         valid_lft forever preferred_lft forever
>> 3: eth0@if29: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
>>      link/ether 66:4a:9c:12:5b:6c brd ff:ff:ff:ff:ff:ff
>>      inet6 2a0a:e5c0:10:1e:7f21:83ca:a7d:46d2/128 scope global
>>         valid_lft forever preferred_lft forever
>>      inet6 fe80::644a:9cff:fe12:5b6c/64 scope link
>>         valid_lft forever preferred_lft forever
>> 4: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
>>      link/ether 3c:ec:ef:cb:d8:1b brd ff:ff:ff:ff:ff:ff
>>      inet 147.78.195.254/27 brd 147.78.195.255 scope global net1
>>         valid_lft forever preferred_lft forever
>>      inet6 2a0a:e5c0:1:8::53/64 scope global
>>         valid_lft forever preferred_lft forever
>>      inet6 fe80::3eec:efff:fecb:d81b/64 scope link
>>         valid_lft forever preferred_lft forever
>> 5: v1477819464: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN qlen 1000
>>      link/[65534]
>>      inet 147.78.194.65/26 scope global v1477819464
>>         valid_lft forever preferred_lft forever
>>      inet6 2a0a:e5c0:2e::1/64 scope global
>>         valid_lft forever preferred_lft forever
>> 26: net2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
>>      link/ether 3c:ec:ef:cb:d8:1c brd ff:ff:ff:ff:ff:ff
>>      inet 195.141.200.73/31 scope global net2
>>         valid_lft forever preferred_lft forever
>>      inet6 2001:1700:3500:2::12/124 scope global
>>         valid_lft forever preferred_lft forever
>>      inet6 fe80::3eec:efff:fecb:d81c/64 scope link
>>         valid_lft forever preferred_lft forever
>> / #
>>
>> wireguard client behind nat:
>>
>> nb3:/etc/wireguard# curl -4 ifconfig.io
>> 194.5.220.43
>> nb3:/etc/wireguard# ip a sh dev wlan0
>> 2: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
>>      link/ether 84:5c:f3:ed:52:9c brd ff:ff:ff:ff:ff:ff
>>      inet 192.168.4.85/24 brd 192.168.4.255 scope global dynamic noprefixroute wlan0
>>         valid_lft 317sec preferred_lft 242sec
>>      inet6 2a0a:e5c0:13:0:865c:f3ff:feed:529c/64 scope global dynamic mngtmpaddr noprefixroute
>>         valid_lft 86394sec preferred_lft 14394sec
>>      inet6 fe80::865c:f3ff:feed:529c/64 scope link
>>         valid_lft forever preferred_lft forever
>> nb3:/etc/wireguard#
>>
>>
>> [1]
>> / # ip route get 194.5.220.43
>> 194.5.220.43 via 195.141.200.72 dev net2  src 195.141.200.73
>> / #
>>
>>
>> Mike O'Connor <mike@pineview.net> writes:
>>
>>> Generally all OSs will if sending from a local process will use the
>>> address of the outgoing interface for the packet.
>>>
>>> If the packet is forwarded and no NAT is used the address will be
>>> routed via the interface suggested by the routing table.
>>>
>>> So local routing can be a real pain, policy based routing is an
>>> option. The other option could be to setup an 'output' NAT to an
>>> address which is multi-homed.
>>>
>>> I have a system running which is multi-homed with out issue other than
>>> the actual routing machine. This machine is BGP connected to three
>>> locations.
>>>
>>> There is no NAT setup and because I also add the wireguard link
>>> addresses to the BGP sessions.
>>>
>>> Cheers
>>>
>>>
>>>
>>> On 19/2/2023 6:44 am, Nico Schottelius wrote:
>>>> Dear group,
>>>>
>>>> I was wondering how wireguard [Linux kernel] or wireguard-go [FreeBSD]
>>>> are supposed to decide which IP address to use for replying?
>>>>
>>>> I have seen both on FreeBSD and Linux that wireguard seems to use the IP
>>>> address of the outgoing interface, i.e. the one with the route returning
>>>> to the sender. However in multi homed situations, this can be wrong,
>>>> let's take this example:
>>>>
>>>>         19:57:24.607526 net1  In  IP 194.5.220.43.60770 > 147.78.195.254.51820: UDP, length 148
>>>>         19:57:24.608358 net2  Out IP 195.141.200.73.51820 > 194.5.220.43.60770: UDP, length 92
>>>>
>>>> The initiator sends from 194.5.220.43 to the receiver 147.78.195.254.
>>>> Wireguard then replies with the source IP of 195.141.200.73 instead of
>>>> 147.78.195.254.
>>>>
>>>> As the node is multi homed, the packet might leave through any of its
>>>> uplinks and thus return with a random (unexpected) IP address and will
>>>> not pass NAT rules on firewalls and finally be dropped. F.i. in above
>>>> example the firewall drops the packet from 195.141.200.73, because there
>>>> is no session entry for that.
>>>>
>>>> I have observed this behaviour both on Linux 6.1.11 as well as
>>>> wireguard-go 0.0.20220316_8,1 on FreeBSD and in both cases the
>>>> connection will break depending on which active interface is taken as
>>>> exit.
>>>>
>>>> I would argue that wireguard should by default invert the IP
>>>> addresses, i.e. switch dst=src, src=dst and then reply with that,
>>>> instead of adapting an interface specific address, or is there a good
>>>> reason for the current behaviour?
>>>>
>>>> Best regards,
>>>>
>>>> Nico
>>>>
>>>> --
>>>> Sustainable and modern Infrastructures by ungleich.ch
>
> --
> Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 14:39         ` Christoph Loesch
  2023-02-19 16:32           ` David Kerr
@ 2023-02-19 20:02           ` Nico Schottelius
  1 sibling, 0 replies; 35+ messages in thread
From: Nico Schottelius @ 2023-02-19 20:02 UTC (permalink / raw)
  To: Christoph Loesch; +Cc: wireguard


Hello Christoph,

Christoph Loesch <wireguard-mail@chil.at> writes:
> @Nico: did you try to delete the affected route and add it again with the correct source IP ?

No, I did not because the routes are really dynamic on the affected
systems and I would need to overwrite the BGP routes with a better
metric, which in turn will likely break the return path.

> as I mentioned it in https://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html
>
> ip route del <NET>
> ip route add <NET> dev <ALIAS_DEV> src <SRC_IP>
>
> This way I was able to (at least temporary) fix this issue on multi homed systems.

Much appreciate the hint. However changing routes manually on as many
routers/vpn endpoints as we have is not a practical solution. To fix the
current project's issue we have shifted the VPN endpoint to a single
homed device for the moment.

Best regards,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 18:04               ` Janne Johansson
  2023-02-19 18:08                 ` Sebastian Hyrvall
@ 2023-02-19 20:11                 ` Nico Schottelius
  1 sibling, 0 replies; 35+ messages in thread
From: Nico Schottelius @ 2023-02-19 20:11 UTC (permalink / raw)
  To: Janne Johansson; +Cc: Sebastian Hyrvall, wireguard


Hey Janne,

Janne Johansson <icepic.dz@gmail.com> writes:
> *) https://en.wiktionary.org/wiki/Chesterton%27s_fence

I am happy to have learned a new principle today, thanks for that.

And to be sure that everyone is on the same page:

    Wireguard should reply by default with the source address that
    used to be the destination address, but at the moment wireguard is not
    doing that at the moment.

If anyone disagrees with above statement, please let me know.

--
Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 18:42               ` tlhackque
@ 2023-02-19 20:18                 ` Nico Schottelius
  2023-02-19 20:42                   ` Roman Mamedov
  0 siblings, 1 reply; 35+ messages in thread
From: Nico Schottelius @ 2023-02-19 20:18 UTC (permalink / raw)
  To: tlhackque; +Cc: wireguard


tlhackque <tlhackque@yahoo.com> writes:
>> [...]
>> 4.1 <https://www.rfc-editor.org/rfc/rfc2181.html#section-4.1>. UDP
>> Source Address Selection
>>
>>     ***To avoid these problems, servers when responding to queries
>> using UDP _must _cause the reply to be sent with the source address
>> field in the IP header set to the address that was in the
>> destination address field of the IP header of the packet containing
>> the query causing the response.** *

OMG, we really have seen everything already, haven't we?

Jason, what do you think about adopting the RFC2181 Source Address
Selection algorithm for wireguard?

If I am not mistaken that would mean in practice:

   if orignal_pkg.ip_dst == one_of_my_ips then
      return_pkg.ip.src = orignal_pkg.ip_dst
      return_pkg.ip.dst = orignal_pkg.ip_src
   fi

For me that sounds like a sane approach (aside from
my very simplified algorithm).

Best regards,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 20:18                 ` Nico Schottelius
@ 2023-02-19 20:42                   ` Roman Mamedov
  2023-02-19 21:19                     ` Nico Schottelius
  2023-02-19 21:39                     ` Source IP incorrect on multi homed systems tlhackque
  0 siblings, 2 replies; 35+ messages in thread
From: Roman Mamedov @ 2023-02-19 20:42 UTC (permalink / raw)
  To: Nico Schottelius; +Cc: tlhackque, wireguard

On Sun, 19 Feb 2023 21:18:34 +0100
Nico Schottelius <nico.schottelius@ungleich.ch> wrote:

> If I am not mistaken that would mean in practice:
> 
>    if orignal_pkg.ip_dst == one_of_my_ips then
>       return_pkg.ip.src = orignal_pkg.ip_dst
>       return_pkg.ip.dst = orignal_pkg.ip_src
>    fi
> 
> For me that sounds like a sane approach (aside from
> my very simplified algorithm).

Except there is no request and response in WG, and as such no original or
return packet. Another peer contacts you, then some time later you contact the
other peer. Or the other way round.

WG-wise what will need to be done is to store in the each peer's information
structure the local IP that we are supposed to use for communication with that
peer; and updating it when receiving packets from the peer, using the
destination of those. So you would see a "Local IP" in each "peer" section
when doing a "wg show".

Also, until there is such IP initially stored, it will have to be some default
outgoing IP of the system towards that peer. BTW, how would this work in your
setup, what if not the peer contacts you first, but your machine needs to
contact the peer?

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 20:42                   ` Roman Mamedov
@ 2023-02-19 21:19                     ` Nico Schottelius
  2023-02-19 22:06                       ` tlhackque
  2023-02-19 22:42                       ` Src addr code review (Was: Source IP incorrect on multi homed systems) Daniel Gröber
  2023-02-19 21:39                     ` Source IP incorrect on multi homed systems tlhackque
  1 sibling, 2 replies; 35+ messages in thread
From: Nico Schottelius @ 2023-02-19 21:19 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Nico Schottelius, tlhackque, wireguard


Hey Roman,

Roman Mamedov <rm@romanrm.net> writes:

> On Sun, 19 Feb 2023 21:18:34 +0100
> Nico Schottelius <nico.schottelius@ungleich.ch> wrote:
>
>> If I am not mistaken that would mean in practice:
>>
>>    if orignal_pkg.ip_dst == one_of_my_ips then
>>       return_pkg.ip.src = orignal_pkg.ip_dst
>>       return_pkg.ip.dst = orignal_pkg.ip_src
>>    fi
>>
>> For me that sounds like a sane approach (aside from
>> my very simplified algorithm).
>
> Except there is no request and response in WG, and as such no original or
> return packet. Another peer contacts you, then some time later you contact the
> other peer. Or the other way round.
>
> WG-wise what will need to be done is to store in the each peer's information
> structure the local IP that we are supposed to use for communication with that
> peer; and updating it when receiving packets from the peer, using the
> destination of those. So you would see a "Local IP" in each "peer" section
> when doing a "wg show".

That is very interesting, thanks for the insight. Reading above
paragraph, I was having a very similar thought that we need to record
the local IP.

> Also, until there is such IP initially stored, it will have to be some default
> outgoing IP of the system towards that peer. BTW, how would this work in your
> setup, what if not the peer contacts you first, but your machine needs to
> contact the peer?

So far this situation doesn't exist for us, because only servers are
multi homed.

However, having an option to specify something a local address in each peer
section would probably be a good solution to disambiguate it and if not
specified, use the default, as in whatever other processes are using
that don't define it explicitly - i.e. follow the process of least
surprise.

Best regards,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 20:42                   ` Roman Mamedov
  2023-02-19 21:19                     ` Nico Schottelius
@ 2023-02-19 21:39                     ` tlhackque
  1 sibling, 0 replies; 35+ messages in thread
From: tlhackque @ 2023-02-19 21:39 UTC (permalink / raw)
  To: Roman Mamedov, Nico Schottelius; +Cc: wireguard


[-- Attachment #1.1: Type: text/plain, Size: 2964 bytes --]

On 19-Feb-23 15:42, Roman Mamedov wrote:
> On Sun, 19 Feb 2023 21:18:34 +0100
> Nico Schottelius<nico.schottelius@ungleich.ch>  wrote:
>
>> If I am not mistaken that would mean in practice:
>>
>>     if orignal_pkg.ip_dst == one_of_my_ips then
>>        return_pkg.ip.src = orignal_pkg.ip_dst
>>        return_pkg.ip.dst = orignal_pkg.ip_src
>>     fi
>>
>> For me that sounds like a sane approach (aside from
>> my very simplified algorithm).
> Except there is no request and response in WG, and as such no original or
> return packet. Another peer contacts you, then some time later you contact the
> other peer. Or the other way round.
>
> WG-wise what will need to be done is to store in the each peer's information
> structure the local IP that we are supposed to use for communication with that
> peer; and updating it when receiving packets from the peer, using the
> destination of those. So you would see a "Local IP" in each "peer" section
> when doing a "wg show".
>
> Also, until there is such IP initially stored, it will have to be some default
> outgoing IP of the system towards that peer. BTW, how would this work in your
> setup, what if not the peer contacts you first, but your machine needs to
> contact the peer?
>
The situation can be (and often is) the same for both peers.

If you're the initiator, you send to the peer address using its 
configured or DNS IP address, and normal routing.  You note the address 
used to send, and use it for future communications to that peer.  The 
first packet sets state in the posited firewall/nat. Subsequent packets 
using the same source address ensures that the firewall sees them as the 
same flow.

When the peer gets around to saying something - which it will at latest 
when the keepalive timer goes off, but probably sooner, it will have 
noted your source address and it's local IP address (the one you used).  
So it will send using the source address that you know about.

This is the same algorithm used by the peer, so they should agree.

When either end detects and address change, the process restarts.

There is a possibility that the initial packets pass in flight, but I 
think that would at most result in a dropped packet, which will be resent.

I don't think there's a deadlock, but in the event of thrashing, a 
tie-breaker of using the lowest candidate IP address generally works..

When there are multiple choices, it doesn't really matter which pair of 
IP addresses are picked, as long as they're stable while the systems 
reside on the same networks.  (E.G. it could be two notebook PCs in 
different hotel rooms, not just two fixed servers or one fixed server 
and one mobile.)  The goal is to establish a flow that stateful packet 
inspection, NAT, routing can recognize and use to keep a pinhole open...

I don't have time at the moment to work out the corner cases, but that's 
the overall approach.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 21:19                     ` Nico Schottelius
@ 2023-02-19 22:06                       ` tlhackque
  2023-02-19 22:42                       ` Src addr code review (Was: Source IP incorrect on multi homed systems) Daniel Gröber
  1 sibling, 0 replies; 35+ messages in thread
From: tlhackque @ 2023-02-19 22:06 UTC (permalink / raw)
  To: WireGuard Mailing list


[-- Attachment #1.1: Type: text/plain, Size: 691 bytes --]

On 19-Feb-23 16:19, Nico Schottelius wrote:
> So far this situation doesn't exist for us, because only servers are
> multi homed.

It's not that uncommon; consider a docked notebook that has a WiFi 
address and an Ethernet address on the same subnet.

While typically the routing priorities favor the Ethernet, the mobile 
will have both addresses.

In a car, you can have WiFi thru the car and mobile data.  (Not saying I 
like this, but ..)

There are probably other cases, but I wouldn't assume it's only a server 
issue.

As I also noted in another note: two servers can have the same issue, if 
both are multi-homed.

The solution really needs to be symmetric.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
       [not found]               ` <CADGd2DoE6TCtCxxWL7JWyNW5+yy_Pe+9MNzHznbudMWLTXQreA@mail.gmail.com>
  2023-02-19 18:30                 ` Fwd: " John Lauro
@ 2023-02-19 22:28                 ` tlhackque
  2023-02-20  0:58                   ` Luiz Angelo Daros de Luca
  1 sibling, 1 reply; 35+ messages in thread
From: tlhackque @ 2023-02-19 22:28 UTC (permalink / raw)
  To: John Lauro; +Cc: wireguard


[-- Attachment #1.1: Type: text/plain, Size: 7599 bytes --]

Actually in my case (I'm not the originator of this thread), I don't run 
BGP.  But I do have both site-site and mobile-site clients.  Much 
simpler environment, but same issue.

I do understand UDP.

As I've noted, DNS UDP has the same issue, and an RFC was issued to 
clarify that responses MUST come from the address on which a query is 
received.

WG isn't quite the same, as it isn't a request/response protocol.  But 
it is a flow between two endpoints, and NAT/firewalls will open a 
pinhole for incoming packets when they see an outbound packet.

One of the nice things about WG is that except for this issue, it has no 
dependencies on custom routing (or anything but UDP) and "just works".  
It should "just work" on multihomed hosts, without handstands, BGP 
routing, different ports, and the like.  It also needs to work where 
it's not feasible to layer on work-arounds, such as VPSs where you don't 
get to pick your kernel...or your firewall.

Picking stable endpoint addresses would make the traffic look like the 
kind of flow that these middleboxes recognize, and things would "just 
work".

On 19-Feb-23 13:25, John Lauro wrote:
> I think the ip route with src would work, but only as a short lived 
> work around.  The problem with that is if dealing with dynamic routes 
> is it could go a way when a link is down and then come back and the 
> src setting would be lost.  You would need the bgp software to add the 
> src.
>
> UDP is connectionless.  Sending back out the same as it's coming in 
> isn't strictly the same.  The streams are not attached the same as 
> they would be with TCP on nginx or a reply with icmp. You should be 
> able to whitelist the udp port on the NAT devices, as it shouldn't use 
> state info.
>
> I am not sure if you are attempting to do site to site or client to 
> server/site and which end has the NAT (or both). What I do for site to 
> site is use a different port for each connection and have a separate 
> BGP connection for each possible connection (ie: different one for 
> different network providers).  Have a full mesh with 8 sites and upto 
> 3 providers per site.
>
> That said, you probably have floating IPs on the client side, and 
> don't want to lock in a single IP on the multi-homed server side?  You 
> could nat the incoming IPs on the border from an internal IP and then 
> then lock to a single private IP on the wireguard server for in/out 
> and that border nat would force the reply back to the same gateway it 
> came in from.
>
> I know, you don't want work arounds, just want to mention it's not the 
> same as comparing a single stream to something that handles routing 
> though it.  As you are doing bgp and redundant routes I assume you 
> also reset rp_filter on all nat/wireguard/routers so the routers will 
> allow packets to come from different sources.
>
> On Sun, Feb 19, 2023 at 12:07 PM tlhackque <tlhackque@yahoo.com> wrote:
>
>     FWIW, while clever, I don't think that iptables mark solves all
>     cases.
>     E.g., consider an interface with multiple addresses, where a packet
>     comes in on a secondary address.  The proposed rule would send it out
>     the right interface, but still with the wrong (primary) address
>     picked
>     from the interface...
>
>     With IPv6 it's common to assign an address to a service rather than a
>     host so services can move easily.  So multiple addresses per
>     interface
>     are the rule, not the exception.
>
>     I do the same with IPv4 inside addresses, though these days public
>     IPv4
>     addresses are scarce enough that it's not common for public IPs.  It
>     amounts to the same issue - the NAT tracking is stateful.
>
>     Trying to work around this with routing seems like a maze of twisty
>     passages - so I agree that the right solution is for WG to respond
>     from
>     the address that receives a packet.
>
>     On 19-Feb-23 11:32, David Kerr wrote:
>     > Without getting into the debate of whether wireguard is acting
>     > correctly or not, I think there is a possible workaround.
>     >
>     > 1. In the iptables mangle table PREROUTING, match the incoming
>     > interface and destination address and --set-xmark a firewall MARK
>     > unique to this interface/destination
>     > 2. Create a new ip route table that sets the default route to go out
>     > on the interface with the source address you want (same as
>     destination
>     > address in iptables)
>     > 3. Create a new ip rule that sends all packets with firewall
>     mark set
>     > in iptables to the routing table you just created
>     >
>     > Repeat above for each interface/address you need to mangle, with a
>     > unique firewall mark and routing table for each.
>     >
>     > It may be necessary to use CONNMARK in PREROUTING and OUTPUT to
>     > --restore_mark.  I can't remember if this is needed or not, its
>     been a
>     > while since I configured iptables with this.
>     >
>     > This should ensure that any packet that comes into an
>     > interface/address is replied to from the same interface/address.
>     >
>     > David
>     >
>     >
>     > On Sun, Feb 19, 2023 at 9:44 AM Christoph
>     Loesch<wireguard-mail@chil.at> wrote:
>     >> Hi,
>     >>
>     >> I don't think no one wants to fix it, there are several users
>     having this issue. I rather guess no one could find a suitable
>     solution to fix it.
>     >>
>     >> @Nico: did you try to delete the affected route and add it
>     again with the correct source IP ?
>     >>
>     >> as I mentioned it
>     inhttps://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html
>     <http://lists.zx2c4.com/pipermail/wireguard/2021-November/007324.html>
>     >>
>     >> ip route del <NET>
>     >> ip route add <NET> dev <ALIAS_DEV> src <SRC_IP>
>     >>
>     >> This way I was able to (at least temporary) fix this issue on
>     multi homed systems.
>     >>
>     >> Kind regards,
>     >> Christoph
>     >>
>     >> Am 19.02.2023 um 13:13 schrieb Nico Schottelius:
>     >>> Hey Sebastian,
>     >>>
>     >>> Sebastian Hyrwall<sh@keff.org>  writes:
>     >>>
>     >>>> It is kinda. It's been mentioned multiple times over the
>     years but no one seems to want to fix it. Atleast you should be
>     able to specify bind/src ip in the
>     >>>> config. I gave up WG because of it. Wasn't accepted by my
>     projects security policy since src ip could not be configured.
>     >>>>
>     >>>> There is an unofficial patch however,
>     >>>>
>     >>>>
>     https://github.com/torvalds/linux/commit/5fa98082093344c86345f9f63305cae9d5f9f281
>     >>> the binding is somewhat related to this issue and I was
>     looking for that
>     >>> feature some time ago, too. While it is correlated and I would
>     really
>     >>> appreciate binding support, I am not sure whether the linked
>     patch does
>     >>> actually fix the problem I am seeing in multi homed devices.
>     >>>
>     >>> As long as wireguard does not reply with the same IP address
>     it was
>     >>> contacted with, packets will get dropped on stateful
>     firewalls, because
>     >>> the returning packet does not match the state session database.
>     >>>
>     >>> Best regards,
>     >>>
>     >>> Nico
>     >>>
>     >>> --
>     >>> Sustainable and modern Infrastructures by ungleich.ch
>     <http://ungleich.ch>
>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Src addr code review (Was: Source IP incorrect on multi homed systems)
  2023-02-19 21:19                     ` Nico Schottelius
  2023-02-19 22:06                       ` tlhackque
@ 2023-02-19 22:42                       ` Daniel Gröber
  2023-02-20  0:28                         ` 曹煜
  2023-02-20  9:47                         ` Nico Schottelius
  1 sibling, 2 replies; 35+ messages in thread
From: Daniel Gröber @ 2023-02-19 22:42 UTC (permalink / raw)
  To: Nico Schottelius; +Cc: Roman Mamedov, tlhackque, wireguard

Hi,

I though it might be useful to do some quick and dirty code review instead
of speculating wildly to figure out where these source IP selection
problems could be coming from ;)

From previous code deep dives I know the udp_tunnel_xmit_skb function is
where tunnel packets get handed off to the kernel. So in
net/wireguard/socket.c:send4 we have:

	udp_tunnel_xmit_skb(rt, sock, skb, fl.saddr, fl.daddr, ds,
			    ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport,
			    fl.fl4_dport, false, false);

Where fl.saddr is the source address that's supposedly wrong (sometimes? I
guess?) Where does that come from?

Let's look at the code (heavily culled):

	struct flowi4 fl = {
		.saddr = endpoint->src4.s_addr,
	};
	if (cache)
		rt = dst_cache_get_ip4(cache, &fl.saddr);
	if (!rt) {
		if (unlikely(!inet_confirm_addr(sock_net(sock), NULL, 0,
						fl.saddr, RT_SCOPE_HOST)))
			fl.saddr = 0;
		if (unlikely(endpoint->src_if4 && ((IS_ERR(rt) &&
			     PTR_ERR(rt) == -EINVAL) || (!IS_ERR(rt) &&
			     rt->dst.dev->ifindex != endpoint->src_if4))))
			fl.saddr = 0;

Well it's initialized from endpoint->src4.s_addr, overwritten with zero in
some cases, which I believe lets the kernel do it's regular source addr
selection, and populated from something called dst_cache at some callsites.

@Nico could it perhaps simply be that you're hitting one of these zero'ing
cases and that's why it's using regular kernel src addr selection instead
of the cached endpoint src4 address?

The first case !inet_confirm_addr(..., RT_SCOPE_HOST) ought to confirm that
the saddr is actually still a local address. Makes sens if the address we
remembered was removed from the interface we can't use it anymore.

The second case looks like it's checking if the (sometimes cached) src_if4
interface index is still what the route we're about to use points to.

If neither of those seem likely we can keep reading :)

--Daniel




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Src addr code review (Was: Source IP incorrect on multi homed systems)
  2023-02-19 22:42                       ` Src addr code review (Was: Source IP incorrect on multi homed systems) Daniel Gröber
@ 2023-02-20  0:28                         ` 曹煜
  2023-02-20 10:40                           ` Nico Schottelius
  2023-02-20  9:47                         ` Nico Schottelius
  1 sibling, 1 reply; 35+ messages in thread
From: 曹煜 @ 2023-02-20  0:28 UTC (permalink / raw)
  To: Daniel Gröber; +Cc: Nico Schottelius, Roman Mamedov, tlhackque, wireguard

Hi all,
I've hacked that source code myself months ago, and it works well on
my use case (I have 4 dual stack pppoe wan set on my openwrt router,
and seted a wireguard sever on it), my hack will pickup the dst_addr
from incoming handshake packet in kernel sk_buff, and then use that
addr as src_addr to reply.
I'm not good at source code, and I know that my hack may be ugly, but
it works, hope this patch can help:
https://github.com/openwrt/packages/issues/9538#issuecomment-1150592803

Daniel Gröber <dxld@darkboxed.org> 于2023年2月20日周一 06:42写道:
>
> Hi,
>
> I though it might be useful to do some quick and dirty code review instead
> of speculating wildly to figure out where these source IP selection
> problems could be coming from ;)
>
> From previous code deep dives I know the udp_tunnel_xmit_skb function is
> where tunnel packets get handed off to the kernel. So in
> net/wireguard/socket.c:send4 we have:
>
>         udp_tunnel_xmit_skb(rt, sock, skb, fl.saddr, fl.daddr, ds,
>                             ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport,
>                             fl.fl4_dport, false, false);
>
> Where fl.saddr is the source address that's supposedly wrong (sometimes? I
> guess?) Where does that come from?
>
> Let's look at the code (heavily culled):
>
>         struct flowi4 fl = {
>                 .saddr = endpoint->src4.s_addr,
>         };
>         if (cache)
>                 rt = dst_cache_get_ip4(cache, &fl.saddr);
>         if (!rt) {
>                 if (unlikely(!inet_confirm_addr(sock_net(sock), NULL, 0,
>                                                 fl.saddr, RT_SCOPE_HOST)))
>                         fl.saddr = 0;
>                 if (unlikely(endpoint->src_if4 && ((IS_ERR(rt) &&
>                              PTR_ERR(rt) == -EINVAL) || (!IS_ERR(rt) &&
>                              rt->dst.dev->ifindex != endpoint->src_if4))))
>                         fl.saddr = 0;
>
> Well it's initialized from endpoint->src4.s_addr, overwritten with zero in
> some cases, which I believe lets the kernel do it's regular source addr
> selection, and populated from something called dst_cache at some callsites.
>
> @Nico could it perhaps simply be that you're hitting one of these zero'ing
> cases and that's why it's using regular kernel src addr selection instead
> of the cached endpoint src4 address?
>
> The first case !inet_confirm_addr(..., RT_SCOPE_HOST) ought to confirm that
> the saddr is actually still a local address. Makes sens if the address we
> remembered was removed from the interface we can't use it anymore.
>
> The second case looks like it's checking if the (sometimes cached) src_if4
> interface index is still what the route we're about to use points to.
>
> If neither of those seem likely we can keep reading :)
>
> --Daniel
>
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
  2023-02-19 22:28                 ` tlhackque
@ 2023-02-20  0:58                   ` Luiz Angelo Daros de Luca
  0 siblings, 0 replies; 35+ messages in thread
From: Luiz Angelo Daros de Luca @ 2023-02-20  0:58 UTC (permalink / raw)
  To: tlhackque; +Cc: John Lauro, wireguard

Yes, wg is not a request/response protocol. But it does have some
state. Can't wireguard remember the last local address that each peer
sent traffic? It is just like the tracking already in use for peer ip
address. If there is an "last address" it would be nice if we could
hint the kernel to use that as the source address, with a fallback to
the current behavior if the address is not available. It might solve a
couple of problems. I just don't know if it is possible to hint the
source address without enforcing it. It not, wg would have to deal
with cases when the address is gone.

Regards,

Luiz

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Src addr code review (Was: Source IP incorrect on multi homed systems)
  2023-02-19 22:42                       ` Src addr code review (Was: Source IP incorrect on multi homed systems) Daniel Gröber
  2023-02-20  0:28                         ` 曹煜
@ 2023-02-20  9:47                         ` Nico Schottelius
  2023-02-20 20:43                           ` dxld
  1 sibling, 1 reply; 35+ messages in thread
From: Nico Schottelius @ 2023-02-20  9:47 UTC (permalink / raw)
  To: Daniel Gröber; +Cc: Nico Schottelius, Roman Mamedov, tlhackque, wireguard


Hey Daniel,

thanks a lot for diving in ...

Daniel Gröber <dxld@darkboxed.org> writes:
> Let's look at the code (heavily culled):
>
> 	struct flowi4 fl = {
> 		.saddr = endpoint->src4.s_addr,
> 	};
> 	if (cache)
> 		rt = dst_cache_get_ip4(cache, &fl.saddr);

What I am wondering is, how did it get into the cache in the first place?

> [...]
>
> @Nico could it perhaps simply be that you're hitting one of these zero'ing
> cases and that's why it's using regular kernel src addr selection instead
> of the cached endpoint src4 address?

That could absolutely be the case. What is funky is that I see the
problem on two very different systems, but maybe it's a good time to
elaborate on this:

- System A:
  - Wireguard module loaded on the host
  - Wireguard wg-quick used within a kubernetes pods that has
    permissions for managing wireguard
  - The same pod also runs bird for BGP peering

- System B:
  - Wireguard running as wireguard-go on OpnSense / FreeBSD
  - BGP running with frr

Both systems exhibit the behaviour, but maybe it's better to focus on
System A first, as this seems to be more the "upstream" source.

Best regards,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Src addr code review (Was: Source IP incorrect on multi homed systems)
  2023-02-20  0:28                         ` 曹煜
@ 2023-02-20 10:40                           ` Nico Schottelius
  2023-02-20 11:21                             ` 曹煜
  0 siblings, 1 reply; 35+ messages in thread
From: Nico Schottelius @ 2023-02-20 10:40 UTC (permalink / raw)
  To: 曹煜
  Cc: Daniel Gröber, Nico Schottelius, Roman Mamedov, tlhackque,
	wireguard


Hello 曹煜,

on github it seems your patch was applied / the issue was closed - is
that the correct current status?

Best regards,

Nico

曹煜 <cao88yu@gmail.com> writes:

> Hi all,
> I've hacked that source code myself months ago, and it works well on
> my use case (I have 4 dual stack pppoe wan set on my openwrt router,
> and seted a wireguard sever on it), my hack will pickup the dst_addr
> from incoming handshake packet in kernel sk_buff, and then use that
> addr as src_addr to reply.
> I'm not good at source code, and I know that my hack may be ugly, but
> it works, hope this patch can help:
> https://github.com/openwrt/packages/issues/9538#issuecomment-1150592803
>
> Daniel Gröber <dxld@darkboxed.org> 于2023年2月20日周一 06:42写道:
>>
>> Hi,
>>
>> I though it might be useful to do some quick and dirty code review instead
>> of speculating wildly to figure out where these source IP selection
>> problems could be coming from ;)
>>
>> From previous code deep dives I know the udp_tunnel_xmit_skb function is
>> where tunnel packets get handed off to the kernel. So in
>> net/wireguard/socket.c:send4 we have:
>>
>>         udp_tunnel_xmit_skb(rt, sock, skb, fl.saddr, fl.daddr, ds,
>>                             ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport,
>>                             fl.fl4_dport, false, false);
>>
>> Where fl.saddr is the source address that's supposedly wrong (sometimes? I
>> guess?) Where does that come from?
>>
>> Let's look at the code (heavily culled):
>>
>>         struct flowi4 fl = {
>>                 .saddr = endpoint->src4.s_addr,
>>         };
>>         if (cache)
>>                 rt = dst_cache_get_ip4(cache, &fl.saddr);
>>         if (!rt) {
>>                 if (unlikely(!inet_confirm_addr(sock_net(sock), NULL, 0,
>>                                                 fl.saddr, RT_SCOPE_HOST)))
>>                         fl.saddr = 0;
>>                 if (unlikely(endpoint->src_if4 && ((IS_ERR(rt) &&
>>                              PTR_ERR(rt) == -EINVAL) || (!IS_ERR(rt) &&
>>                              rt->dst.dev->ifindex != endpoint->src_if4))))
>>                         fl.saddr = 0;
>>
>> Well it's initialized from endpoint->src4.s_addr, overwritten with zero in
>> some cases, which I believe lets the kernel do it's regular source addr
>> selection, and populated from something called dst_cache at some callsites.
>>
>> @Nico could it perhaps simply be that you're hitting one of these zero'ing
>> cases and that's why it's using regular kernel src addr selection instead
>> of the cached endpoint src4 address?
>>
>> The first case !inet_confirm_addr(..., RT_SCOPE_HOST) ought to confirm that
>> the saddr is actually still a local address. Makes sens if the address we
>> remembered was removed from the interface we can't use it anymore.
>>
>> The second case looks like it's checking if the (sometimes cached) src_if4
>> interface index is still what the route we're about to use points to.
>>
>> If neither of those seem likely we can keep reading :)
>>
>> --Daniel
>>
>>
>>


--
Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Src addr code review (Was: Source IP incorrect on multi homed systems)
  2023-02-20 10:40                           ` Nico Schottelius
@ 2023-02-20 11:21                             ` 曹煜
  0 siblings, 0 replies; 35+ messages in thread
From: 曹煜 @ 2023-02-20 11:21 UTC (permalink / raw)
  To: Nico Schottelius; +Cc: Daniel Gröber, Roman Mamedov, tlhackque, wireguard

Hi Nico,
That issue was closed by myself, but the patch didn't get applied
cause the issue was came from wireguard itself, and the maintener told
me that I should send my patch to wireguard upstream (but I just gave
up for sending it to wireguard team).

Nico Schottelius <nico.schottelius@ungleich.ch> 于2023年2月20日周一 18:41写道:
>
>
> Hello 曹煜,
>
> on github it seems your patch was applied / the issue was closed - is
> that the correct current status?
>
> Best regards,
>
> Nico
>
> 曹煜 <cao88yu@gmail.com> writes:
>
> > Hi all,
> > I've hacked that source code myself months ago, and it works well on
> > my use case (I have 4 dual stack pppoe wan set on my openwrt router,
> > and seted a wireguard sever on it), my hack will pickup the dst_addr
> > from incoming handshake packet in kernel sk_buff, and then use that
> > addr as src_addr to reply.
> > I'm not good at source code, and I know that my hack may be ugly, but
> > it works, hope this patch can help:
> > https://github.com/openwrt/packages/issues/9538#issuecomment-1150592803
> >
> > Daniel Gröber <dxld@darkboxed.org> 于2023年2月20日周一 06:42写道:
> >>
> >> Hi,
> >>
> >> I though it might be useful to do some quick and dirty code review instead
> >> of speculating wildly to figure out where these source IP selection
> >> problems could be coming from ;)
> >>
> >> From previous code deep dives I know the udp_tunnel_xmit_skb function is
> >> where tunnel packets get handed off to the kernel. So in
> >> net/wireguard/socket.c:send4 we have:
> >>
> >>         udp_tunnel_xmit_skb(rt, sock, skb, fl.saddr, fl.daddr, ds,
> >>                             ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport,
> >>                             fl.fl4_dport, false, false);
> >>
> >> Where fl.saddr is the source address that's supposedly wrong (sometimes? I
> >> guess?) Where does that come from?
> >>
> >> Let's look at the code (heavily culled):
> >>
> >>         struct flowi4 fl = {
> >>                 .saddr = endpoint->src4.s_addr,
> >>         };
> >>         if (cache)
> >>                 rt = dst_cache_get_ip4(cache, &fl.saddr);
> >>         if (!rt) {
> >>                 if (unlikely(!inet_confirm_addr(sock_net(sock), NULL, 0,
> >>                                                 fl.saddr, RT_SCOPE_HOST)))
> >>                         fl.saddr = 0;
> >>                 if (unlikely(endpoint->src_if4 && ((IS_ERR(rt) &&
> >>                              PTR_ERR(rt) == -EINVAL) || (!IS_ERR(rt) &&
> >>                              rt->dst.dev->ifindex != endpoint->src_if4))))
> >>                         fl.saddr = 0;
> >>
> >> Well it's initialized from endpoint->src4.s_addr, overwritten with zero in
> >> some cases, which I believe lets the kernel do it's regular source addr
> >> selection, and populated from something called dst_cache at some callsites.
> >>
> >> @Nico could it perhaps simply be that you're hitting one of these zero'ing
> >> cases and that's why it's using regular kernel src addr selection instead
> >> of the cached endpoint src4 address?
> >>
> >> The first case !inet_confirm_addr(..., RT_SCOPE_HOST) ought to confirm that
> >> the saddr is actually still a local address. Makes sens if the address we
> >> remembered was removed from the interface we can't use it anymore.
> >>
> >> The second case looks like it's checking if the (sometimes cached) src_if4
> >> interface index is still what the route we're about to use points to.
> >>
> >> If neither of those seem likely we can keep reading :)
> >>
> >> --Daniel
> >>
> >>
> >>
>
>
> --
> Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Src addr code review (Was: Source IP incorrect on multi homed systems)
  2023-02-20  9:47                         ` Nico Schottelius
@ 2023-02-20 20:43                           ` dxld
  0 siblings, 0 replies; 35+ messages in thread
From: dxld @ 2023-02-20 20:43 UTC (permalink / raw)
  To: Nico Schottelius; +Cc: Roman Mamedov, tlhackque, wireguard

Hi Nico,

On Mon, Feb 20, 2023 at 10:47:36AM +0100, Nico Schottelius wrote:
> Daniel Gröber <dxld@darkboxed.org> writes:
> > Let's look at the code (heavily culled):
> >
> > 	struct flowi4 fl = {
> > 		.saddr = endpoint->src4.s_addr,
> > 	};
> > 	if (cache)
> > 		rt = dst_cache_get_ip4(cache, &fl.saddr);
> 
> What I am wondering is, how did it get into the cache in the first place?

Right so, endpoint->src4 is set in wg_socket_set_peer_endpoint, which is
called either trough through wg_socket_endpoint_from_skb in the handshake
receive code or wg_socket_set_peer_endpoint in the data path.

The _from_skb variant also calls wg_socket_endpoint_from_skb. Here we're
remembering the src addr of the (received) packet in addr4 and the dst addr
we're going to use for sending as src4 as you'd expect:

	endpoint->addr4.sin_family = AF_INET;
	endpoint->addr4.sin_port = udp_hdr(skb)->source;
	endpoint->addr4.sin_addr.s_addr = ip_hdr(skb)->saddr;
	endpoint->src4.s_addr = ip_hdr(skb)->daddr;
	endpoint->src_if4 = skb->skb_iif;

The dst_cache is set just after those zero'ing conditionals we were looking
at before. It's cleared whenever the endpoint/port changes or one of those
cases is hit. Note the dst_cache is only used for data packets, so
handshakes would be unaffected if it was the cause of your woes.

> > @Nico could it perhaps simply be that you're hitting one of these zero'ing
> > cases and that's why it's using regular kernel src addr selection instead
> > of the cached endpoint src4 address?
> 
> That could absolutely be the case. What is funky is that I see the
> problem on two very different systems
>
> Both systems exhibit the behaviour, but maybe it's better to focus on
> System A first, as this seems to be more the "upstream" source.

It is weird indeed, but yeah. One thing at a time.

BTW, what kernel version/distro are we dealing with?

--Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Source IP incorrect on multi homed systems
@ 2023-02-20 11:09 Janne Johansson
  0 siblings, 0 replies; 35+ messages in thread
From: Janne Johansson @ 2023-02-20 11:09 UTC (permalink / raw)
  To: WireGuard mailing list

rewriting for the lists, managed to bold some pasted text and hence
get blocked due to html-mails not allowed on list.

Den sön 19 feb. 2023 kl 21:17 skrev Nico Schottelius
<nico.schottelius@ungleich.ch>:
> Janne Johansson <icepic.dz@gmail.com> writes:
> > *) https://en.wiktionary.org/wiki/Chesterton%27s_fence
>
> I am happy to have learned a new principle today, thanks for that.
>
> And to be sure that everyone is on the same page:
>
>     Wireguard should reply by default with the source address that
>     used to be the destination address, but at the moment wireguard is not
>     doing that at the moment.
>
> If anyone disagrees with above statement, please let me know.

I disagree, but perhaps only because that statement is slightly too short.

Let's assume I have two ISPs and hence a multihomed wg peer, with ip
A.x.x.x from isp A, and ip B.x.x.x from isp B. For some reason, this
box has a routing table that says "prefer link A to reach the
internet", but I set up client C to set up wireguard to B.1.2.3 and
client C sends it udp packet with src ip C and dest IP B.x.x.x. Since
UDP is stateless, the "response" from the multihomed server is created
"out of thin air" as a random UDP packet destined for C. We don't feel
it is unrelated to the previous received packet, but from the tcp
stack perspective it is.

The routing table now decides that interface A will be the awesomest
for sending UDP to C, and therefore creates a packet with source ip
A.x.x.x and dest ip C.x.x.x and sends it off. This surprise seems to
be the main issue in this thread. Perhaps we see this multihomed box
as slightly misconfigured as far as wireguard goes, perhaps it should
have posted A.x.x.x instead of B.x.x.x as the wg endpoint to the
client or whatever, but the facts remain.

Now, in your above statement you hope to get everyone to agree on,
this would need to also include "sending it back on interface B, to
the gw used by interface B to ISP B if there is one" or else isp A
might drop the packet as being sent from a "forged" address since it
looks like a fake source ip from isp As perspective. The routing
lookups - before any applied tricks - will look at destination IPs
only and make the decision based on that.

I think the proposed solution, while attractive at first glance, may
be trading one kind of "surprise" behaviour to another where the
interface B might be less useful than A which would explain why the
default route is set to use A. If you look at the many posts on the
internet over many years about "why udp source ip got chosen wrong on
multihomed boxes" you see answers like:
"You either bind(2) to each interface address and manage multiple
sockets, or let the kernel do the implicit source IP assignment with
INADDR_ANY. There is no other way."
( https://stackoverflow.com/questions/3062205/setting-the-source-ip-for-a-udp-socket
, not about vpns and lots older than wireguard)

What this means is that if you have a box where links and interfaces
come and go (usb wifi dongles, tethered cell phones..) then wireguard
now has to make a lot of extra work, trying to keep tabs on what
interfaces exist or not, instead of just binding to port 0 and letting
the kernel handle this by itself in the normal but to some, surprising
way for udp packets.

My gut feeling is that if you have a setup like this example
multihomed peer, you get to do some extra steps, which may include the
aforementioned firewall mark "tricks", use VRFs/Namespaced
interfaces/routing domains, add a specific route to client C ip over
link B, bind wg to a loopback interface or source-nat on outgoing wg
traffic or something along those lines in order to have a wg endpoint
on a less-preferred interface and not cause issues with
stateful-nat-gws at client C.

--
May the most significant bit of your life be positive.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2023-02-20 20:43 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-18 20:14 Source IP incorrect on multi homed systems Nico Schottelius
     [not found] ` <CAHx9msc1cNV80YU7HRmQ9gsjSEiVZ=pb31aYqfP62hy8DeuGZA@mail.gmail.com>
2023-02-18 22:34   ` Nico Schottelius
2023-02-19  0:45 ` Mike O'Connor
2023-02-19  8:01   ` Nico Schottelius
2023-02-19  9:19     ` Mikma
2023-02-19 12:04       ` Nico Schottelius
2023-02-19 12:10     ` Nico Schottelius
2023-02-19 18:59       ` Peter Linder
     [not found]     ` <2ed829aaed9fec59ac2a9b32c4ce0a9005b8d8b850be81c81a226791855fe4eb@mu.id>
2023-02-19 12:13       ` Nico Schottelius
2023-02-19 14:39         ` Christoph Loesch
2023-02-19 16:32           ` David Kerr
2023-02-19 16:54             ` Sebastian Hyrvall
2023-02-19 18:04               ` Janne Johansson
2023-02-19 18:08                 ` Sebastian Hyrvall
2023-02-19 20:11                 ` Nico Schottelius
2023-02-19 17:05             ` tlhackque
2023-02-19 18:37               ` David Kerr
2023-02-19 18:52                 ` tlhackque
2023-02-19 18:42               ` tlhackque
2023-02-19 20:18                 ` Nico Schottelius
2023-02-19 20:42                   ` Roman Mamedov
2023-02-19 21:19                     ` Nico Schottelius
2023-02-19 22:06                       ` tlhackque
2023-02-19 22:42                       ` Src addr code review (Was: Source IP incorrect on multi homed systems) Daniel Gröber
2023-02-20  0:28                         ` 曹煜
2023-02-20 10:40                           ` Nico Schottelius
2023-02-20 11:21                             ` 曹煜
2023-02-20  9:47                         ` Nico Schottelius
2023-02-20 20:43                           ` dxld
2023-02-19 21:39                     ` Source IP incorrect on multi homed systems tlhackque
     [not found]               ` <CADGd2DoE6TCtCxxWL7JWyNW5+yy_Pe+9MNzHznbudMWLTXQreA@mail.gmail.com>
2023-02-19 18:30                 ` Fwd: " John Lauro
2023-02-19 22:28                 ` tlhackque
2023-02-20  0:58                   ` Luiz Angelo Daros de Luca
2023-02-19 20:02           ` Nico Schottelius
2023-02-20 11:09 Janne Johansson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).