potentially disallowing IP fragmentation on wg packets, and handling routing loops better

Development discussion of WireGuard
 help / color / mirror / Atom feed

* potentially disallowing IP fragmentation on wg packets, and handling routing loops better
@ 2021-06-06  9:13 Jason A. Donenfeld
  2021-06-06  9:32 ` Nico Schottelius
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Jason A. Donenfeld @ 2021-06-06  9:13 UTC (permalink / raw)
  To: WireGuard mailing list
  Cc: Roman Mamedov, zrm, StarBrilliant, Baptiste Jonglez, Joe Holden

Hi,

WireGuard is an encrypted point-to-multipoint tunnel, where onion
layering of packets via a single interface or multiple is a useful
feature. This makes handling routing loops very hard to manage and
detect. I'm considering changing and simplifying loop mitigation to a
different strategy, but not without some discussion of its
implications.

Specifically the change would be to not allow IP fragmentation of the
encrypted UDP packets. This way, in the case of a loop, eventually the
packet size exceeds MTU, and it gets dropped: dumb and effective.
Depending on how this discussion goes, a compromise would be to not
allow fragmentation, but only for forwarded and kernel-generated
packets, not not for locally generated userspace packets. That's more
complex and I don't like it as much as just disallowing IP
fragmentation all together.

Pros:
- It solves the routing loop problem very simply.
- Usually when people are fragmenting packets like that, things become
very, very slow anyway, and it'd be better to just stop working
entirely, so that people adjust their MTU.
- Is anybody actually relying on this?

Cons:
- Maybe people are running
wireguard-over-gre-over-vxlan-over-l2tp-over-pppoe-over-god-knows-what-else,
and this reduces the MTU to below 1280, yet they still want to put
IPv6 through wireguard, and are willing to accept the performance
implications.
- Some people don't know how to fix their MTUs, and breaking rather
than just becoming really slow isn't the best outcome there, maybe.
- Maybe people are relying on this?

Before anybody asks: we're not going to add a knob for this, simply by
virtue of this being a decision with pros and cons. Please don't bring
that up.

I'd be very interested in opinions about this. Are there additional
pros and cons? I know the matter has come up a few times on the list,
mostly with people _wanting_ fragmentation (I've CCd a few people from
those threads - Roman, I expect you to vigorously argue the
pro-fragmentation stance ;-). but I'm not convinced the outcome of
those threads was correct, other than, "yea, that's easy enough to
enable." But on the other hand, maybe the cons are real enough we
should rethink this.

Please let me know thoughts and ideas.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
  2021-06-06  9:13 potentially disallowing IP fragmentation on wg packets, and handling routing loops better Jason A. Donenfeld
@ 2021-06-06  9:32 ` Nico Schottelius
  2021-06-06 10:39 ` Vasili Pupkin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Nico Schottelius @ 2021-06-06  9:32 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Roman Mamedov, zrm, StarBrilliant, Baptiste Jonglez, Joe Holden,
	wireguard

Hello,

so given that fragmentation is disallowed the PMTU discovery always
needs to work and the wireguard MTU needs to be correctly adjusted.

Speaking of a DC situation, I think this might be tricky. Imagine the
following situation:

- endhost A has an MTU of 9k. PMTU 9k. wg 8920.
- the path changes, the PMTU reduces to 1.5k (this is something we see
 happening from time to time)
- How is the wg MTU adjusted in this situation?

And to clarify: with disallowing IP frag, you are obviously only
referring to the outter transport. Within the tunnels, IPv6 and IPv6
packets can still be fragmented, so application operation is not really
affected.

Interesting approach, I am not really sure if realisticly feasible,
especially when thinking about long range/low bandwidth media where
you'd basically say "wg cannot do IPv6 on these mediums". Satelite
systems should probably work fine, I am more concerned about mesh
networks, in which wg is quite popular already.

Cheers,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
  2021-06-06  9:13 potentially disallowing IP fragmentation on wg packets, and handling routing loops better Jason A. Donenfeld
  2021-06-06  9:32 ` Nico Schottelius
@ 2021-06-06 10:39 ` Vasili Pupkin
  2021-06-06 11:14 ` Peter Linder
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Vasili Pupkin @ 2021-06-06 10:39 UTC (permalink / raw)
  To: WireGuard mailing list

Hi,

I've dig into the subject two years ago and only vague remember details. 
As far as I can recall there was a time when WireGuard set DF flag by 
default and there were two issues:

1) for security reasons WireGuard doesn't issue ICMP fragmentation 
required response in the unencrypted channel if an encrypted packed 
didn't fit and was dropped
2) there is no way client can tell the server of MTU limitation it has 
on its side

Combining the two we have a situation in a chained wireguard VPN setup 
when MTU size is misconfigured on the server and the remote host 
wouldn't get any icmp to help with its PMTUD algorithm. The client can 
still set MSS in its TCP connection though.

Again sorry if I missed or messed something, it was long ago and I don't 
remember details.


On 06.06.2021 12:13, Jason A. Donenfeld wrote:
> Hi,
>
> WireGuard is an encrypted point-to-multipoint tunnel, where onion
> layering of packets via a single interface or multiple is a useful
> feature. This makes handling routing loops very hard to manage and
> detect. I'm considering changing and simplifying loop mitigation to a
> different strategy, but not without some discussion of its
> implications.
>
> Specifically the change would be to not allow IP fragmentation of the
> encrypted UDP packets. This way, in the case of a loop, eventually the
> packet size exceeds MTU, and it gets dropped: dumb and effective.
> Depending on how this discussion goes, a compromise would be to not
> allow fragmentation, but only for forwarded and kernel-generated
> packets, not not for locally generated userspace packets. That's more
> complex and I don't like it as much as just disallowing IP
> fragmentation all together.
>
> Pros:
> - It solves the routing loop problem very simply.
> - Usually when people are fragmenting packets like that, things become
> very, very slow anyway, and it'd be better to just stop working
> entirely, so that people adjust their MTU.
> - Is anybody actually relying on this?
>
> Cons:
> - Maybe people are running
> wireguard-over-gre-over-vxlan-over-l2tp-over-pppoe-over-god-knows-what-else,
> and this reduces the MTU to below 1280, yet they still want to put
> IPv6 through wireguard, and are willing to accept the performance
> implications.
> - Some people don't know how to fix their MTUs, and breaking rather
> than just becoming really slow isn't the best outcome there, maybe.
> - Maybe people are relying on this?
>
> Before anybody asks: we're not going to add a knob for this, simply by
> virtue of this being a decision with pros and cons. Please don't bring
> that up.
>
> I'd be very interested in opinions about this. Are there additional
> pros and cons? I know the matter has come up a few times on the list,
> mostly with people _wanting_ fragmentation (I've CCd a few people from
> those threads - Roman, I expect you to vigorously argue the
> pro-fragmentation stance ;-). but I'm not convinced the outcome of
> those threads was correct, other than, "yea, that's easy enough to
> enable." But on the other hand, maybe the cons are real enough we
> should rethink this.
>
> Please let me know thoughts and ideas.
>
> Thanks,
> Jason
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
  2021-06-06  9:13 potentially disallowing IP fragmentation on wg packets, and handling routing loops better Jason A. Donenfeld
  2021-06-06  9:32 ` Nico Schottelius
  2021-06-06 10:39 ` Vasili Pupkin
@ 2021-06-06 11:14 ` Peter Linder
  2021-06-07 11:58   ` Derek Fawcus
  2021-06-06 19:03 ` Roman Mamedov
  2021-06-07  9:34 ` Jason A. Donenfeld
  4 siblings, 1 reply; 15+ messages in thread
From: Peter Linder @ 2021-06-06 11:14 UTC (permalink / raw)
  To: wireguard

This would break things for me. We're doing a lot of L2 over L3 site to 
site stuff and we are using wireguard as the outer layer. Inner layer is 
vxlan or l2tpv3.

In particular, people connect lots of stuff with no regard for MTU. For 
some things it's also very hard to change so we just assume people 
don't. Since the L3 network typically has the same MTU as the inner L2 
network, we need fragmentation. There is no practical way to be able to 
tell hosts on the L2 network about the limited mtu, for all we know they 
don't even run IP....

It really does work without a hassle, it is not very very slow at all. 
Performance is down perhaps by a factor of 3 compared to setting a 
smaller MTU/MSS, but we can still push 350mbit/s with an atom 2ghz cpu, 
and around 800mbit/s with a xeon cpu, with fragmentation for most 
packets. This is one case where wireguard really works well!

IMHO, having wireguard generating fragmentable packets adds a lot to its 
usefulness. With that said, it's not the end of the world for me as I 
can just compile my own but I'd rather not :-)


On 2021-06-06 11:13, Jason A. Donenfeld wrote:
> Hi,
>
> WireGuard is an encrypted point-to-multipoint tunnel, where onion
> layering of packets via a single interface or multiple is a useful
> feature. This makes handling routing loops very hard to manage and
> detect. I'm considering changing and simplifying loop mitigation to a
> different strategy, but not without some discussion of its
> implications.
>
> Specifically the change would be to not allow IP fragmentation of the
> encrypted UDP packets. This way, in the case of a loop, eventually the
> packet size exceeds MTU, and it gets dropped: dumb and effective.
> Depending on how this discussion goes, a compromise would be to not
> allow fragmentation, but only for forwarded and kernel-generated
> packets, not not for locally generated userspace packets. That's more
> complex and I don't like it as much as just disallowing IP
> fragmentation all together.
>
> Pros:
> - It solves the routing loop problem very simply.
> - Usually when people are fragmenting packets like that, things become
> very, very slow anyway, and it'd be better to just stop working
> entirely, so that people adjust their MTU.
> - Is anybody actually relying on this?
>
> Cons:
> - Maybe people are running
> wireguard-over-gre-over-vxlan-over-l2tp-over-pppoe-over-god-knows-what-else,
> and this reduces the MTU to below 1280, yet they still want to put
> IPv6 through wireguard, and are willing to accept the performance
> implications.
> - Some people don't know how to fix their MTUs, and breaking rather
> than just becoming really slow isn't the best outcome there, maybe.
> - Maybe people are relying on this?
>
> Before anybody asks: we're not going to add a knob for this, simply by
> virtue of this being a decision with pros and cons. Please don't bring
> that up.
>
> I'd be very interested in opinions about this. Are there additional
> pros and cons? I know the matter has come up a few times on the list,
> mostly with people _wanting_ fragmentation (I've CCd a few people from
> those threads - Roman, I expect you to vigorously argue the
> pro-fragmentation stance ;-). but I'm not convinced the outcome of
> those threads was correct, other than, "yea, that's easy enough to
> enable." But on the other hand, maybe the cons are real enough we
> should rethink this.
>
> Please let me know thoughts and ideas.
>
> Thanks,
> Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
  2021-06-06  9:13 potentially disallowing IP fragmentation on wg packets, and handling routing loops better Jason A. Donenfeld
                   ` (2 preceding siblings ...)
  2021-06-06 11:14 ` Peter Linder
@ 2021-06-06 19:03 ` Roman Mamedov
  2021-06-06 22:33   ` Joe Holden
  2021-06-07  9:34 ` Jason A. Donenfeld
  4 siblings, 1 reply; 15+ messages in thread
From: Roman Mamedov @ 2021-06-06 19:03 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: WireGuard mailing list, zrm, StarBrilliant, Baptiste Jonglez, Joe Holden

On Sun, 6 Jun 2021 11:13:36 +0200
"Jason A. Donenfeld" <Jason@zx2c4.com> wrote:

> Specifically the change would be to not allow IP fragmentation of the
> encrypted UDP packets. This way, in the case of a loop, eventually the
> packet size exceeds MTU, and it gets dropped: dumb and effective.
> Depending on how this discussion goes, a compromise would be to not
> allow fragmentation, but only for forwarded and kernel-generated
> packets, not not for locally generated userspace packets. That's more
> complex and I don't like it as much as just disallowing IP
> fragmentation all together.
> 
> Pros:
> - It solves the routing loop problem very simply.

Doesn't TTL already solve this?

> - Maybe people are running
> wireguard-over-gre-over-vxlan-over-l2tp-over-pppoe-over-god-knows-what-else,
> and this reduces the MTU to below 1280, yet they still want to put
> IPv6 through wireguard, and are willing to accept the performance
> implications.

Not only that. Sometimes transparent bridging of 1500 MTU LANs is required.

VXLAN does not allow tunnel endpoints to produce fragmented VXLAN packets.

With WG we can fragment them one level lower, *and* gain a higher efficiency
compared to hypothetical VXLAN's fragmentation, due to less header overhead on
2nd and further packets in a chain.

It would be unfortunate if this will become no longer possible.

It appears to me that people who might need to transparently join multiple
Ethernet LANs due to legacy network topologies they have to work with, weird
requirements, various legacy software etc, would outnumber those who even run
WG over WG at all, let alone getting themselves into a routing loop that way.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
  2021-06-06 19:03 ` Roman Mamedov
@ 2021-06-06 22:33   ` Joe Holden
  0 siblings, 0 replies; 15+ messages in thread
From: Joe Holden @ 2021-06-06 22:33 UTC (permalink / raw)
  To: Roman Mamedov, Jason A. Donenfeld
  Cc: WireGuard mailing list, zrm, StarBrilliant, Baptiste Jonglez

On 2021-06-06 21:03, Roman Mamedov wrote:
> On Sun, 6 Jun 2021 11:13:36 +0200
> "Jason A. Donenfeld" <Jason@zx2c4.com> wrote:
> 
>> Specifically the change would be to not allow IP fragmentation of the
>> encrypted UDP packets. This way, in the case of a loop, eventually the
>> packet size exceeds MTU, and it gets dropped: dumb and effective.
>> Depending on how this discussion goes, a compromise would be to not
>> allow fragmentation, but only for forwarded and kernel-generated
>> packets, not not for locally generated userspace packets. That's more
>> complex and I don't like it as much as just disallowing IP
>> fragmentation all together.
>>
>> Pros:
>> - It solves the routing loop problem very simply.
> 
> Doesn't TTL already solve this?
> 
>> - Maybe people are running
>> wireguard-over-gre-over-vxlan-over-l2tp-over-pppoe-over-god-knows-what-else,
>> and this reduces the MTU to below 1280, yet they still want to put
>> IPv6 through wireguard, and are willing to accept the performance
>> implications.
> 
> Not only that. Sometimes transparent bridging of 1500 MTU LANs is required.
> 
> VXLAN does not allow tunnel endpoints to produce fragmented VXLAN packets.
> 
> With WG we can fragment them one level lower, *and* gain a higher efficiency
> compared to hypothetical VXLAN's fragmentation, due to less header overhead on
> 2nd and further packets in a chain.
> 
> It would be unfortunate if this will become no longer possible.
> 
> It appears to me that people who might need to transparently join multiple
> Ethernet LANs due to legacy network topologies they have to work with, weird
> requirements, various legacy software etc, would outnumber those who even run
> WG over WG at all, let alone getting themselves into a routing loop that way.
> 
All of the above, really - not allowing "full" sized frames over WG
breaks a huge number of use cases - even simple ones, because regardless
of how much it's wished to be true, in reality pmtu isn't very useful
and doesn't work for many cases even in an environment where it isn't
completely broken by firewalls/misconfiguration.

A [probably common] simple example is where there are 1500 byte speakers
on either side of a WG link (e.g. the internet, or some satellite site)
- having a <1500 byte link in the middle will break many applications,
in particular especially UDP based protocols.

Unfortunately the better solution is likely to make it configurable, or
allow fragmentation for forwarded traffic (since the host already knows
the mtu, this solves the problem without requiring any user config) -
although understandably you don't want to add more complexity

thanks

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
  2021-06-06  9:13 potentially disallowing IP fragmentation on wg packets, and handling routing loops better Jason A. Donenfeld
                   ` (3 preceding siblings ...)
  2021-06-06 19:03 ` Roman Mamedov
@ 2021-06-07  9:34 ` Jason A. Donenfeld
  2021-06-07 11:13   ` Roman Mamedov
                     ` (2 more replies)
  4 siblings, 3 replies; 15+ messages in thread
From: Jason A. Donenfeld @ 2021-06-07  9:34 UTC (permalink / raw)
  To: WireGuard mailing list
  Cc: Roman Mamedov, zrm, StarBrilliant, Baptiste Jonglez, Joe Holden,
	Nico Schottelius, Vasili Pupkin, peter

Hey folks,

There seems to be a bit of confusion about *which* stage of
fragmentation would be affected by the proposal, so I drew some
diagrams to help illustrate what I'm talking about. Please take a
look:

https://data.zx2c4.com/potential-wg-fragmentation-proposal.png

1) Ingress fragmentation would not be affected by this and is not
relevant for this discussion. This is the case in which a computer
gets a packet for forwarding out of the wireguard interface, and it's
larger than the interface's mtu, so the computer fragments it before
passing it onto that interface. I'm not suggesting any change in this
behavior.

2) Local egress fragmentation WOULD be affected by this and is the
most relevant thing in this discussion. In this case, a packet that
gets encrypted and winds up being larger than the mtu of the interface
that the encrypted packet will go out of gets fragmented. In this
case, we could likely respond with an ICMP packet or similar in-path
error. But keep in mind this whole situation is local: it usually will
only happen out of misconfiguration. The best fix for the diagram I
drew would be for the administrator to decrease the MTU of the
wireguard interface to 1412.

3) Path egress fragmentation COULD be affected by this, but doesn't
have to be. In this case, we simply set "don't fragment" on encrypted
egress packets, which means they won't be fragmented by other
computers along the path.

So, of those concerned about this, which concerns are actually about
(2) and (3)? Of those, which ones are about (2)? If you have concerns
specifically about (2) that couldn't be fixed with reasonable system
administration, I'd like to hear why and what the setup is that leads
to that situation.

As an aside, Roman asked about TTL. When tunneling, the outer packet
header always must take the new TTL of the route to the tunnel
endpoint, and not do anything with the potentially much smaller inner
TTL. So with tunneling, you can't quite rely on the TTL to drop to
zero as you'd wish. Hence, I'm interested in using the natural packet
size expansion instead.

Thanks for the discussion so far. I'm very interested to read
clarifying points about applicability to case (2) (and to a lesser
extent, about case (3)).

Thanks,
Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
  2021-06-07  9:34 ` Jason A. Donenfeld
@ 2021-06-07 11:13   ` Roman Mamedov
  2021-06-07 11:27     ` Jason A. Donenfeld
  2021-06-07 11:18   ` Nico Schottelius
  2021-06-09 23:26   ` Vasili Pupkin
  2 siblings, 1 reply; 15+ messages in thread
From: Roman Mamedov @ 2021-06-07 11:13 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: WireGuard mailing list, zrm, StarBrilliant, Baptiste Jonglez,
	Joe Holden, Nico Schottelius, Vasili Pupkin, peter

On Mon, 7 Jun 2021 11:34:21 +0200
"Jason A. Donenfeld" <Jason@zx2c4.com> wrote:

> 2) Local egress fragmentation WOULD be affected by this and is the
> most relevant thing in this discussion. In this case, a packet that
> gets encrypted and winds up being larger than the mtu of the interface
> that the encrypted packet will go out of gets fragmented. In this
> case, we could likely respond with an ICMP packet or similar in-path
> error. But keep in mind this whole situation is local: it usually will
> only happen out of misconfiguration. The best fix for the diagram I
> drew would be for the administrator to decrease the MTU of the
> wireguard interface to 1412.

In the L2 tunneling scenario the large VXLAN packets are generated locally, as
it will be common for the same host (aka "the router") to be both a WG peer
and a VXLAN VTEP, so it is going to be affected.

> So, of those concerned about this, which concerns are actually about
> (2) and (3)? Of those, which ones are about (2)? If you have concerns
> specifically about (2) that couldn't be fixed with reasonable system
> administration, I'd like to hear why and what the setup is that leads
> to that situation.

My described case is being able to transparently bridge two Ethernet LANs.

Hopefully the answer isn't "you don't really need to do that" or "apply
reasonable system administration and set up routing instead".

> As an aside, Roman asked about TTL. When tunneling, the outer packet
> header always must take the new TTL of the route to the tunnel
> endpoint, and not do anything with the potentially much smaller inner
> TTL.

As far as I can see the inner TTL is not smaller than usual on WG tunnels (64).
You could inherit it to the outside of the tunnel, like GRE does:
https://serverfault.com/questions/827239/gre-tunnel-ttl-number
But of course that's leaking a tiny bit of information about the encrypted
tunnel, dunno how critical that would be.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
  2021-06-07  9:34 ` Jason A. Donenfeld
  2021-06-07 11:13   ` Roman Mamedov
@ 2021-06-07 11:18   ` Nico Schottelius
  2021-06-09 23:26   ` Vasili Pupkin
  2 siblings, 0 replies; 15+ messages in thread
From: Nico Schottelius @ 2021-06-07 11:18 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: WireGuard mailing list, Roman Mamedov, zrm, StarBrilliant,
	Baptiste Jonglez, Joe Holden, Nico Schottelius, Vasili Pupkin,
	peter

Hey Jason,

Jason A. Donenfeld <Jason@zx2c4.com> writes:

> Hey folks,
>
> There seems to be a bit of confusion about *which* stage of
> fragmentation would be affected by the proposal, so I drew some
> diagrams to help illustrate what I'm talking about. Please take a
> look:
>
> https://data.zx2c4.com/potential-wg-fragmentation-proposal.png

I love the math: 2792 = 1420 + 1420 = 1500 + 1500

Joke aside, ...

> 1) Ingress fragmentation would not be affected by this and is not
> relevant for this discussion. This is the case in which a computer
> gets a packet for forwarding out of the wireguard interface, and it's
> larger than the interface's mtu, so the computer fragments it before
> passing it onto that interface. I'm not suggesting any change in this
> behavior.

I believe this is something wireguard cannot influence *anyway* as the
sending side can send any sized packet towards us.

> 2) Local egress fragmentation WOULD be affected by this and is the
> most relevant thing in this discussion. In this case, a packet that
> gets encrypted and winds up being larger than the mtu of the interface
> that the encrypted packet will go out of gets fragmented. In this
> case, we could likely respond with an ICMP packet or similar in-path
> error. But keep in mind this whole situation is local: it usually will
> only happen out of misconfiguration. The best fix for the diagram I
> drew would be for the administrator to decrease the MTU of the
> wireguard interface to 1412.

So how does that behave in the situation that the upstream interface or
routes change? Let's say WG MTU = 1412, original PMTU = 1500, decreases
to 1420. Would that reduce the WG mtu automatically to 1332? I guess
not. So what happens with packets arrive with size = 1420?

> 3) Path egress fragmentation COULD be affected by this, but doesn't
> have to be. In this case, we simply set "don't fragment" on encrypted
> egress packets, which means they won't be fragmented by other
> computers along the path.

That's true, but then it would be required to fragment them locally,
wouldn't it?

I'm trying to wrap my head around this in comparison to IPv6/IPv4: In
the IPv6 world we don't have fragmentation on the path, it's always
client based. In the IPv4 world routers can dis/re-assemble packets on
the way.

If I understand it correctly, you are somewhat suggestion that wireguard
behaves a bit like an IPv6 router, albeit for both the v6 and the v4
world. Is that comparison making sense somehow?

I think it would be easier to understand, if there was a demo case, a
sample tunnel that rejects packets, if fragmentation is needed. What
would be the appropriate ICMP message for an IPv4 packet that does not
include the DF bit?

So far, I'm not fully convinced the approach is a smart way, especially
not when it comes to handling network debugging and given that we do
already have a TTL that should be a loop prevention as well.

Best regards,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
  2021-06-07 11:13   ` Roman Mamedov
@ 2021-06-07 11:27     ` Jason A. Donenfeld
  2021-06-07 11:46       ` Roman Mamedov
  0 siblings, 1 reply; 15+ messages in thread
From: Jason A. Donenfeld @ 2021-06-07 11:27 UTC (permalink / raw)
  To: Roman Mamedov
  Cc: WireGuard mailing list, zrm, StarBrilliant, Baptiste Jonglez,
	Joe Holden, Nico Schottelius, Vasili Pupkin, peter

Hi Roman,

On Mon, Jun 7, 2021 at 1:13 PM Roman Mamedov <rm@romanrm.net> wrote:
> In the L2 tunneling scenario the large VXLAN packets are generated locally, as
> it will be common for the same host (aka "the router") to be both a WG peer
> and a VXLAN VTEP, so it is going to be affected.

Can you walk me through your use case a bit more, so I can wrap my mind
around the requirements?

ingress --plain--> wireguard --wireguard[plain]--> vxlan --vxlan[wireguard[plain]]--> egress

So my question is, why can't you set wireguard's MTU to 80 bytes less
than vxlan's MTU? What's preventing that or making it infeasible?

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
  2021-06-07 11:27     ` Jason A. Donenfeld
@ 2021-06-07 11:46       ` Roman Mamedov
  2021-06-07 11:55         ` Peter Linder
  2021-06-07 18:50         ` Roman Mamedov
  0 siblings, 2 replies; 15+ messages in thread
From: Roman Mamedov @ 2021-06-07 11:46 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: WireGuard mailing list, zrm, StarBrilliant, Baptiste Jonglez,
	Joe Holden, Nico Schottelius, Vasili Pupkin, peter

On Mon, 7 Jun 2021 13:27:10 +0200
"Jason A. Donenfeld" <Jason@zx2c4.com> wrote:

> Can you walk me through your use case a bit more, so I can wrap my mind
> around the requirements?
> 
> ingress --plain--> wireguard --wireguard[plain]--> vxlan --vxlan[wireguard[plain]]--> egress

Not sure I understand your scheme correctly. In any case, the path of a
packet would be...

On peer 1:

* plain Ethernet -> wrapped into VXLAN -> encrypted into WireGuard

On peer 2:

* decrypted from WireGuard -> unwrapped from VXLAN -> plain Ethernet

> So my question is, why can't you set wireguard's MTU to 80 bytes less
> than vxlan's MTU? What's preventing that or making it infeasible?

To transparently bridge two Ethernet LANs, a VXLAN interface needs to join an
L2 bridge. All interfaces that are members of a bridge must have the same MTU.

As such, br0 members on both sides:
  eth0 (MTU 1500)
  vx0 (MTU 1500)

VXLAN transports full L2 frames encapsulating them into UDP. To fit the
full 1500-byte packet and accounting for VXLAN and related IP overheads,
the resulting packet size is 1574 bytes.

So this same host that just generated the 1574-byte encapsulated VXLAN packet
with something it received via its eth0 port, now needs to send it further to
its WG peer(s). For this to succeed, the in-tunnel WG MTU needs to be 1574 or
more, not 1412 or 1420, as VXLAN itself can't be fragmented[1]; or even if it
could, that would mean a much worse overhead ratio than currently.

[1] https://datatracker.ietf.org/doc/html/rfc7348#section-4.3

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
  2021-06-07 11:46       ` Roman Mamedov
@ 2021-06-07 11:55         ` Peter Linder
  2021-06-07 18:50         ` Roman Mamedov
  1 sibling, 0 replies; 15+ messages in thread
From: Peter Linder @ 2021-06-07 11:55 UTC (permalink / raw)
  To: Roman Mamedov, Jason A. Donenfeld
  Cc: WireGuard mailing list, zrm, StarBrilliant, Baptiste Jonglez,
	Joe Holden, Nico Schottelius, Vasili Pupkin

This is indeed the case for me, spot on.

On 2021-06-07 13:46, Roman Mamedov wrote:
> So this same host that just generated the 1574-byte encapsulated VXLAN packet
> with something it received via its eth0 port, now needs to send it further to
> its WG peer(s). For this to succeed, the in-tunnel WG MTU needs to be 1574 or
> more, not 1412 or 1420, as VXLAN itself can't be fragmented[1]; or even if it
> could, that would mean a much worse overhead ratio than currently.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
  2021-06-06 11:14 ` Peter Linder
@ 2021-06-07 11:58   ` Derek Fawcus
  0 siblings, 0 replies; 15+ messages in thread
From: Derek Fawcus @ 2021-06-07 11:58 UTC (permalink / raw)
  To: Peter Linder; +Cc: wireguard

On Sun, Jun 06, 2021 at 01:14:16PM +0200, Peter Linder wrote:
> This would break things for me. We're doing a lot of L2 over L3 site to 
> site stuff and we are using wireguard as the outer layer. Inner layer is 
> vxlan or l2tpv3.
>
> In particular, people connect lots of stuff with no regard for MTU. For 
> some things it's also very hard to change so we just assume people 
> don't. Since the L3 network typically has the same MTU as the inner L2 
> network, we need fragmentation. There is no practical way to be able to 
> tell hosts on the L2 network about the limited mtu, for all we know they 
> don't even run IP....

I've not looked in to vxlan much, but for L2TP you always have recourse
to RFC 4623, where the MRU & MRRU can be exchanged.

DF

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
  2021-06-07 11:46       ` Roman Mamedov
  2021-06-07 11:55         ` Peter Linder
@ 2021-06-07 18:50         ` Roman Mamedov
  1 sibling, 0 replies; 15+ messages in thread
From: Roman Mamedov @ 2021-06-07 18:50 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: WireGuard mailing list, zrm, StarBrilliant, Baptiste Jonglez,
	Joe Holden, Nico Schottelius, Vasili Pupkin, peter

On Mon, 7 Jun 2021 16:46:17 +0500
Roman Mamedov <rm@romanrm.net> wrote:

> On Mon, 7 Jun 2021 13:27:10 +0200
> "Jason A. Donenfeld" <Jason@zx2c4.com> wrote:
> 
> > Can you walk me through your use case a bit more, so I can wrap my mind
> > around the requirements?
> > 
> > ingress --plain--> wireguard --wireguard[plain]--> vxlan --vxlan[wireguard[plain]]--> egress
> 
> Not sure I understand your scheme correctly. In any case, the path of a
> packet would be...
> 
> On peer 1:
> 
> * plain Ethernet -> wrapped into VXLAN -> encrypted into WireGuard
> 
> On peer 2:
> 
> * decrypted from WireGuard -> unwrapped from VXLAN -> plain Ethernet
> 
> > So my question is, why can't you set wireguard's MTU to 80 bytes less
> > than vxlan's MTU? What's preventing that or making it infeasible?
> 
> To transparently bridge two Ethernet LANs, a VXLAN interface needs to join an
> L2 bridge. All interfaces that are members of a bridge must have the same MTU.
> 
> As such, br0 members on both sides:
>   eth0 (MTU 1500)
>   vx0 (MTU 1500)
> 
> VXLAN transports full L2 frames encapsulating them into UDP. To fit the
> full 1500-byte packet and accounting for VXLAN and related IP overheads,
> the resulting packet size is 1574 bytes.
> 
> So this same host that just generated the 1574-byte encapsulated VXLAN packet
> with something it received via its eth0 port, now needs to send it further to
> its WG peer(s). For this to succeed, the in-tunnel WG MTU needs to be 1574 or
> more, not 1412 or 1420, as VXLAN itself can't be fragmented[1]; or even if it
> could, that would mean a much worse overhead ratio than currently.
> 
> [1] https://datatracker.ietf.org/doc/html/rfc7348#section-4.3

In case you are not convinced by this case, would you consider at least
allowing fragmentation when WG's in-tunnel MTU is set to >=1500? Because this
is the user effectively saying "yes I know this is not gonna fit in one
packet, I want to rely on WG packets being fragmented", but without the need
for extra knobs.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
  2021-06-07  9:34 ` Jason A. Donenfeld
  2021-06-07 11:13   ` Roman Mamedov
  2021-06-07 11:18   ` Nico Schottelius
@ 2021-06-09 23:26   ` Vasili Pupkin
  2 siblings, 0 replies; 15+ messages in thread
From: Vasili Pupkin @ 2021-06-09 23:26 UTC (permalink / raw)
  To: Jason A. Donenfeld, WireGuard mailing list
  Cc: Roman Mamedov, zrm, StarBrilliant, Baptiste Jonglez, Joe Holden,
	Nico Schottelius, peter

Hi,

Talking about my example with chained VPNs. It is a misconfiguration but 
not intentional, and no responsible administrator can solve this because 
client really has no way to tell a VPN provider what MTU he needs. 
Technically VP providers can make such interface for clients but none of 
four VPN providers I've tried during the last three years has this. 
Usually VPN providers just use a default ?wg-quick? MTU in their server 
configs.

On the client side it is (2 local egress fragmentation), client knows 
both of the MTU values, but on the server side it already needs (3 path 
egress fragmentation). If I understand the terminology correctly. For a 
packet on the route: remote host -> VPN provider 1 -> VPN provider 2 -> 
client. An unecrypted packet comes form remote host, the VPN provider 1 
just sends a bigger encrypted packet to the VPN provider 2 and even if 
it responds with ICMP Fragmentation Needed to the VPN provider 1, it 
will be ignored and would not be repeated in the unecrypted channel to 
remote host. So the remote host will never know why the packet was 
dropped and it will slow down PMTUD. How difficult it is and what 
security implications it will have if WireGuard do capture ICMP 
Fragmentation Needed responses and repeat them in unencrypted channel?

On 07.06.2021 12:34, Jason A. Donenfeld wrote:
> Hey folks,
>
> There seems to be a bit of confusion about *which* stage of
> fragmentation would be affected by the proposal, so I drew some
> diagrams to help illustrate what I'm talking about. Please take a
> look:
>
> https://data.zx2c4.com/potential-wg-fragmentation-proposal.png
>
> 1) Ingress fragmentation would not be affected by this and is not
> relevant for this discussion. This is the case in which a computer
> gets a packet for forwarding out of the wireguard interface, and it's
> larger than the interface's mtu, so the computer fragments it before
> passing it onto that interface. I'm not suggesting any change in this
> behavior.
>
> 2) Local egress fragmentation WOULD be affected by this and is the
> most relevant thing in this discussion. In this case, a packet that
> gets encrypted and winds up being larger than the mtu of the interface
> that the encrypted packet will go out of gets fragmented. In this
> case, we could likely respond with an ICMP packet or similar in-path
> error. But keep in mind this whole situation is local: it usually will
> only happen out of misconfiguration. The best fix for the diagram I
> drew would be for the administrator to decrease the MTU of the
> wireguard interface to 1412.
>
> 3) Path egress fragmentation COULD be affected by this, but doesn't
> have to be. In this case, we simply set "don't fragment" on encrypted
> egress packets, which means they won't be fragmented by other
> computers along the path.
>
> So, of those concerned about this, which concerns are actually about
> (2) and (3)? Of those, which ones are about (2)? If you have concerns
> specifically about (2) that couldn't be fixed with reasonable system
> administration, I'd like to hear why and what the setup is that leads
> to that situation.
>
> As an aside, Roman asked about TTL. When tunneling, the outer packet
> header always must take the new TTL of the route to the tunnel
> endpoint, and not do anything with the potentially much smaller inner
> TTL. So with tunneling, you can't quite rely on the TTL to drop to
> zero as you'd wish. Hence, I'm interested in using the natural packet
> size expansion instead.
>
> Thanks for the discussion so far. I'm very interested to read
> clarifying points about applicability to case (2) (and to a lesser
> extent, about case (3)).
>
> Thanks,
> Jason
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-06-09 23:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-06  9:13 potentially disallowing IP fragmentation on wg packets, and handling routing loops better Jason A. Donenfeld
2021-06-06  9:32 ` Nico Schottelius
2021-06-06 10:39 ` Vasili Pupkin
2021-06-06 11:14 ` Peter Linder
2021-06-07 11:58   ` Derek Fawcus
2021-06-06 19:03 ` Roman Mamedov
2021-06-06 22:33   ` Joe Holden
2021-06-07  9:34 ` Jason A. Donenfeld
2021-06-07 11:13   ` Roman Mamedov
2021-06-07 11:27     ` Jason A. Donenfeld
2021-06-07 11:46       ` Roman Mamedov
2021-06-07 11:55         ` Peter Linder
2021-06-07 18:50         ` Roman Mamedov
2021-06-07 11:18   ` Nico Schottelius
2021-06-09 23:26   ` Vasili Pupkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).