IPv6 and PPPoE with MSSFIX

Development discussion of WireGuard
 help / color / mirror / Atom feed

* IPv6 and PPPoE with MSSFIX
@ 2023-08-22 20:39 Luiz Angelo Daros de Luca
  2023-08-23 14:58 ` Marek Küthe
  2023-08-23 17:07 ` Daniel Gröber
  0 siblings, 2 replies; 8+ messages in thread
From: Luiz Angelo Daros de Luca @ 2023-08-22 20:39 UTC (permalink / raw)
  To: WireGuard mailing list

Hello,

We noticed an issue with clients that use PPPoE and connect to WG
using IPv6. Both sides start to fragment the encrypted packet leading
to a severe degradation in performance. We reduced the wireguard MTU
from the default 1420 to 1400 and the issue was solved. However, I
wonder if it could be fixed with MSSFIX (in my case, nftables
equivalent).

The server does know that the remote address has a smaller MTU as it
fragments the packet accordingly when any VPN peer sends some traffic.
The traffic inside the VPN does adjust the TCP MSS to fit into vpn
interface MTU (1420 by default, now 1400).

I could dynamically add firewall rules to clamp MSS per authorized_ips
but, theoretically, the kernel has all the info to do that
automatically. I wonder if MSSFIX could detect the best MTU for a
specific address through the wireguard. It should consider the
peer-to-peer PMTU, the IP protocol wireguard is using and the normal
wireguard headers.

Regards,

---
     Luiz Angelo Daros de Luca
            luizluca@gmail.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: IPv6 and PPPoE with MSSFIX
  2023-08-22 20:39 IPv6 and PPPoE with MSSFIX Luiz Angelo Daros de Luca
@ 2023-08-23 14:58 ` Marek Küthe
  2023-08-23 17:14   ` Daniel Gröber
  2023-08-23 17:07 ` Daniel Gröber
  1 sibling, 1 reply; 8+ messages in thread
From: Marek Küthe @ 2023-08-23 14:58 UTC (permalink / raw)
  To: wireguard; +Cc: luizluca

[-- Attachment #1: Type: text/plain, Size: 1707 bytes --]

On Tue, 22 Aug 2023 17:39:23 -0300
Luiz Angelo Daros de Luca <luizluca@gmail.com> wrote:

> Hello,
> 
> We noticed an issue with clients that use PPPoE and connect to WG
> using IPv6. Both sides start to fragment the encrypted packet leading
> to a severe degradation in performance. We reduced the wireguard MTU
> from the default 1420 to 1400 and the issue was solved. However, I
> wonder if it could be fixed with MSSFIX (in my case, nftables
> equivalent).

PPPoE adds 8 bytes of overhead so that an MTU of 1432 can be used. I
also have to do this at home with my DSL line for example.
The MTU should be set on each side (on both peers) for this to work.

> The server does know that the remote address has a smaller MTU as it
> fragments the packet accordingly when any VPN peer sends some traffic.

Presumably the OS on the server does this and not WireGuard itself. I
could imagine that the server first receives an ICMP Too big message
and only then performs the fragmentation.

> The traffic inside the VPN does adjust the TCP MSS to fit into vpn
> interface MTU (1420 by default, now 1400).

Keep in mind that TCP MSSFIX only applies to TCP and other Layer 4
protocols like UDP might still have problems.

> I could dynamically add firewall rules to clamp MSS per authorized_ips
> but, theoretically, the kernel has all the info to do that
> automatically. I wonder if MSSFIX could detect the best MTU for a
> specific address through the wireguard. It should consider the
> peer-to-peer PMTU, the IP protocol wireguard is using and the normal
> wireguard headers.

As far as I know WireGuard does not do PMTU.

-- 
Marek Küthe
m.k@mk16.de
er/ihm he/him

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: IPv6 and PPPoE with MSSFIX
  2023-08-22 20:39 IPv6 and PPPoE with MSSFIX Luiz Angelo Daros de Luca
  2023-08-23 14:58 ` Marek Küthe
@ 2023-08-23 17:07 ` Daniel Gröber
  2023-08-23 19:55   ` Luiz Angelo Daros de Luca
  1 sibling, 1 reply; 8+ messages in thread
From: Daniel Gröber @ 2023-08-23 17:07 UTC (permalink / raw)
  To: Luiz Angelo Daros de Luca; +Cc: WireGuard mailing list

Hi Luiz,

On Tue, Aug 22, 2023 at 05:39:23PM -0300, Luiz Angelo Daros de Luca wrote:
> We noticed an issue with clients that use PPPoE and connect to WG
> using IPv6. Both sides start to fragment the encrypted packet leading
> to a severe degradation in performance. We reduced the wireguard MTU
> from the default 1420 to 1400 and the issue was solved. However, I
> wonder if it could be fixed with MSSFIX (in my case, nftables
> equivalent).
> 
> The server does know that the remote address has a smaller MTU as it
> fragments the packet accordingly when any VPN peer sends some traffic.
> The traffic inside the VPN does adjust the TCP MSS to fit into vpn
> interface MTU (1420 by default, now 1400).

Debug note: you can dump the current PMTU info on linux using

     $ ip -6 route show cache

Look at the "mtu" field of the route corresponding to the destination host
you're looking at.

IIRC `ip route get` will also print the PMTU currently in effect.

> I could dynamically add firewall rules to clamp MSS per authorized_ips
> but, theoretically, the kernel has all the info to do that
> automatically. I wonder if MSSFIX could detect the best MTU for a
> specific address through the wireguard. It should consider the
> peer-to-peer PMTU, the IP protocol wireguard is using and the normal
> wireguard headers.

Interesting idea Luiz, so if I understand correctly you have a wg device
with multiple peers where only some of them need the reduced MTU and you'd
like to use the maximum possible MTU for all peers.

As things are this won't "just work" with MSSFIX because the wg device
won't generate ICMP packet-too-big errors for packets sent to it for
encapsulation regardless of the underlying PMTU, rather the wg device will
always fragment when the resulting encapsulated packet doesn't fit as
you've observed.

AFAIK MSSFIX will only look at the actual outgoing route MTU and calculate
the MSS from that. Since wg never causes (dynamic) PMTU entries to be
created that won't work.

However we can also just create "static" PMTU entries. As we've seen above
linux uses the "mtu" route attribute to determine the actual PMTU behind a
route, as opposed to the netdev MTU, which you should think of as the upper
limit of what a link can support.

So you can try adding a route specific for the peer that's behind PPPoE
with the reduced PMTU. Assuming 2001:db8:1432::/64 is this peer's
AllowedIPs:

    $ ip route add 2001:db8:1432::/64 dev wg0 mtu 1432 proto static

You should be able to add this in PostUp in your wg.conf. The "proto
static" is optional, I just like to use that to mark administratively
created routes.

You're still going to want to set the peer's wg device MTU to 1432 or you
can create "mtu" routes in a similar fashion there. Up to you.

Also note MSSFIX or the nft equivalent mouthful `tcp flags syn tcp option
maxseg size set rt mtu` is really only appropriate for IPv4 traffic since
IPv4-PMTU is broken by too many networks. However over in always-sunny IPv6
land PMTU does work and should be preferred to mangling TCP headers. The
static PTMU route we created should cause the kernel to start sending the
appropriate ICMPv6 packet-too-big errors when it's configured for IPv6
forwarding.

You can test the PTB behaviour with `ping 2001:db8:1432::1 -s3000 -M do`.
The -s3000 sends large packets, careful with the size that's the ICMP
_payload size_ so it's not equivalent to MTU, and `-M do` disables local
fragmentation so you can see when PMTU is doing it's job. You'll get
something like "ping: local error: message too long, mtu: XXXX" showing the
PMTU value if ICMP-PTB error generation is working along the path.

--Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: IPv6 and PPPoE with MSSFIX
  2023-08-23 14:58 ` Marek Küthe
@ 2023-08-23 17:14   ` Daniel Gröber
  2023-08-23 19:01     ` Luiz Angelo Daros de Luca
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Gröber @ 2023-08-23 17:14 UTC (permalink / raw)
  To: Marek Küthe; +Cc: wireguard, luizluca

Hi,

On Wed, Aug 23, 2023 at 04:58:40PM +0200, Marek Küthe wrote:
> PPPoE adds 8 bytes of overhead so that an MTU of 1432 can be used. I
> also have to do this at home with my DSL line for example.
> The MTU should be set on each side (on both peers) for this to work.

Oh, I just realized I used the 1432 MTU in my earlier reply based on
Marek's math but since Luiz's underlay network is IPv6 this is not actually
correct. MTU=1440 is only correct on top of IPv4, for IPv6 the "optimal"
MTU is 1420 so with PPPoE involved that's MTU=1412.

  1500 Ethernet payload
   -40 IPv6 header
    -8 UDP header
   -32 Wg header
    -8 PPPoE
===================
  1412 wg tunnel MTU

--Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: IPv6 and PPPoE with MSSFIX
  2023-08-23 17:14   ` Daniel Gröber
@ 2023-08-23 19:01     ` Luiz Angelo Daros de Luca
  2023-08-23 20:47       ` Hugo Slabbert
  0 siblings, 1 reply; 8+ messages in thread
From: Luiz Angelo Daros de Luca @ 2023-08-23 19:01 UTC (permalink / raw)
  To: Daniel Gröber; +Cc: Marek Küthe, wireguard

> Hi,

Hi Daniel,

> On Wed, Aug 23, 2023 at 04:58:40PM +0200, Marek Küthe wrote:
> > PPPoE adds 8 bytes of overhead so that an MTU of 1432 can be used. I
> > also have to do this at home with my DSL line for example.
> > The MTU should be set on each side (on both peers) for this to work.
>
> Oh, I just realized I used the 1432 MTU in my earlier reply based on
> Marek's math but since Luiz's underlay network is IPv6 this is not actually
> correct. MTU=1440 is only correct on top of IPv4, for IPv6 the "optimal"
> MTU is 1420 so with PPPoE involved that's MTU=1412.
>
>   1500 Ethernet payload
>    -40 IPv6 header
>     -8 UDP header
>    -32 Wg header
>     -8 PPPoE
> ===================
>   1412 wg tunnel MTU

In my case, the PPPoE interface got MTU=1480. They might be stacking
something else on top of it or PPPoE might have optional fields. I
read somewhere that PPPoE might use either 8 or 20 bytes, but I'm not
an expert on PPPoE. If I don't control both sides, I would use 1400 by
default.

> --Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: IPv6 and PPPoE with MSSFIX
  2023-08-23 17:07 ` Daniel Gröber
@ 2023-08-23 19:55   ` Luiz Angelo Daros de Luca
  0 siblings, 0 replies; 8+ messages in thread
From: Luiz Angelo Daros de Luca @ 2023-08-23 19:55 UTC (permalink / raw)
  To: Daniel Gröber; +Cc: WireGuard mailing list

> > I could dynamically add firewall rules to clamp MSS per authorized_ips
> > but, theoretically, the kernel has all the info to do that
> > automatically. I wonder if MSSFIX could detect the best MTU for a
> > specific address through the wireguard. It should consider the
> > peer-to-peer PMTU, the IP protocol wireguard is using and the normal
> > wireguard headers.
>
> Interesting idea Luiz, so if I understand correctly you have a wg device
> with multiple peers where only some of them need the reduced MTU and you'd
> like to use the maximum possible MTU for all peers.
>
> As things are this won't "just work" with MSSFIX because the wg device
> won't generate ICMP packet-too-big errors for packets sent to it for
> encapsulation regardless of the underlying PMTU, rather the wg device will
> always fragment when the resulting encapsulated packet doesn't fit as
> you've observed.
>
> AFAIK MSSFIX will only look at the actual outgoing route MTU and calculate
> the MSS from that. Since wg never causes (dynamic) PMTU entries to be
> created that won't work.
>
> However we can also just create "static" PMTU entries. As we've seen above
> linux uses the "mtu" route attribute to determine the actual PMTU behind a
> route, as opposed to the netdev MTU, which you should think of as the upper
> limit of what a link can support.
>
> So you can try adding a route specific for the peer that's behind PPPoE
> with the reduced PMTU. Assuming 2001:db8:1432::/64 is this peer's
> AllowedIPs:
>
>     $ ip route add 2001:db8:1432::/64 dev wg0 mtu 1432 proto static
>
> You should be able to add this in PostUp in your wg.conf. The "proto
> static" is optional, I just like to use that to mark administratively
> created routes.
>
> You're still going to want to set the peer's wg device MTU to 1432 or you
> can create "mtu" routes in a similar fashion there. Up to you.
>
> Also note MSSFIX or the nft equivalent mouthful `tcp flags syn tcp option
> maxseg size set rt mtu` is really only appropriate for IPv4 traffic since
> IPv4-PMTU is broken by too many networks. However over in always-sunny IPv6
> land PMTU does work and should be preferred to mangling TCP headers. The
> static PTMU route we created should cause the kernel to start sending the
> appropriate ICMPv6 packet-too-big errors when it's configured for IPv6
> forwarding.
>
> You can test the PTB behaviour with `ping 2001:db8:1432::1 -s3000 -M do`.
> The -s3000 sends large packets, careful with the size that's the ICMP
> _payload size_ so it's not equivalent to MTU, and `-M do` disables local
> fragmentation so you can see when PMTU is doing it's job. You'll get
> something like "ping: local error: message too long, mtu: XXXX" showing the
> PMTU value if ICMP-PTB error generation is working along the path.

I didn't think about adding the MTU directly to the route table. Now
it is more interesting. Wireguard adds a route to each allowed ips. If
we detect a pmtu change pmtu for a target, we could adjust those
routes to avoid fragmentation. I just don't know if we would break the
connection if we modify MTU up or down during a transfer. I believe
increasing it won't matter for existing connections as MSS is already
negotiated and bringing it down will just fragment the traffic.
Anyway, I believe it is better to fragment the plain packet than the
encrypted one. And for new TCP connections, the firewall can clamp TCP
MSS to the optimal value, even considering if it is using IPv4 or
IPv6.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: IPv6 and PPPoE with MSSFIX
  2023-08-23 19:01     ` Luiz Angelo Daros de Luca
@ 2023-08-23 20:47       ` Hugo Slabbert
  2023-08-28 22:22         ` Luiz Angelo Daros de Luca
  0 siblings, 1 reply; 8+ messages in thread
From: Hugo Slabbert @ 2023-08-23 20:47 UTC (permalink / raw)
  To: Luiz Angelo Daros de Luca; +Cc: Daniel Gröber, Marek Küthe, wireguard

> In my case, the PPPoE interface got MTU=1480. They might be stacking
> something else on top of it or PPPoE might have optional fields. I
> read somewhere that PPPoE might use either 8 or 20 bytes, but I'm not
> an expert on PPPoE.

For ref, an L2TP + PPPoE stack isn't too uncommon, and gives you 20
bytes overhead from the 12 bytes L2TP + 8 bytes PPPoE.


On Wed, Aug 23, 2023 at 12:02 PM Luiz Angelo Daros de Luca
<luizluca@gmail.com> wrote:
>
> > Hi,
>
> Hi Daniel,
>
> > On Wed, Aug 23, 2023 at 04:58:40PM +0200, Marek Küthe wrote:
> > > PPPoE adds 8 bytes of overhead so that an MTU of 1432 can be used. I
> > > also have to do this at home with my DSL line for example.
> > > The MTU should be set on each side (on both peers) for this to work.
> >
> > Oh, I just realized I used the 1432 MTU in my earlier reply based on
> > Marek's math but since Luiz's underlay network is IPv6 this is not actually
> > correct. MTU=1440 is only correct on top of IPv4, for IPv6 the "optimal"
> > MTU is 1420 so with PPPoE involved that's MTU=1412.
> >
> >   1500 Ethernet payload
> >    -40 IPv6 header
> >     -8 UDP header
> >    -32 Wg header
> >     -8 PPPoE
> > ===================
> >   1412 wg tunnel MTU
>
> In my case, the PPPoE interface got MTU=1480. They might be stacking
> something else on top of it or PPPoE might have optional fields. I
> read somewhere that PPPoE might use either 8 or 20 bytes, but I'm not
> an expert on PPPoE. If I don't control both sides, I would use 1400 by
> default.
>
> > --Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: IPv6 and PPPoE with MSSFIX
  2023-08-23 20:47       ` Hugo Slabbert
@ 2023-08-28 22:22         ` Luiz Angelo Daros de Luca
  0 siblings, 0 replies; 8+ messages in thread
From: Luiz Angelo Daros de Luca @ 2023-08-28 22:22 UTC (permalink / raw)
  To: Hugo Slabbert; +Cc: Daniel Gröber, Marek Küthe, wireguard

Hello,

I did some proof-of-concept tests and got nice results. Here is my
current script (https://github.com/luizluca/wireguard-ipv6-pmtu)

It runs as a shell script and updates allowed_ips routes (ipv4 and
ipv6) when there is a cached PMTU to that endpoint (or the local
interface is using a smaller MTU). It just works as expected, avoiding
the fragmentation on the fly for IPv6-connected peers. It must run
periodically as "ip monitor" does not emit events for cached routes.
The best result is when you run it on both sides as it can only fix
the traffic from that endpoint.

As we have already discussed, standard IPv4 has a smaller header and
the default wireguard MTU has some room to fit most tunneling
protocols).

I hit some interesting problems along the way:
1) "ip route get" might fail if all routes that would match also
include a "from". You need to find out the source address wireguard is
using before testing the route. I'm digging it from the conntrack
table but I wish there was a better way.
2) PMTU runs in cycles. It generates a temporary cached route with MTU
once it receives a "packet too big" answer. However, until the route
is gone (expiring, for example), there is no way to generate a traffic
that will retrigger that "packet too big" or refresh the route. Once
the route is gone, the script will remove the MTU limitation,
wireguard might eventually trigger a new "packet too big" and, on its
next run, the script can adjust the MTU. We would need to add some
state to the script to know that a cached route is gone and try to
retrigger the PMTU before removing the MTU limitations. We could also
do some brute force approach like pinging every peer using a large
packet (1500-40-8) before each cycle or simply keep the MTU limitation
forever as it would not hurt that much.

For those who want to play with it, have fun!

Regards,

Luiz

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-08-28 22:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-22 20:39 IPv6 and PPPoE with MSSFIX Luiz Angelo Daros de Luca
2023-08-23 14:58 ` Marek Küthe
2023-08-23 17:14   ` Daniel Gröber
2023-08-23 19:01     ` Luiz Angelo Daros de Luca
2023-08-23 20:47       ` Hugo Slabbert
2023-08-28 22:22         ` Luiz Angelo Daros de Luca
2023-08-23 17:07 ` Daniel Gröber
2023-08-23 19:55   ` Luiz Angelo Daros de Luca

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).