Development discussion of WireGuard
 help / color / mirror / Atom feed
From: leon@is.currently.online
To: wireguard@lists.zx2c4.com
Cc: Leon Schuermann <leon@is.currently.online>
Subject: [RFC PATCH 1/4] netdevice: add ndo_lookup_mtu for dynamically determining MTU
Date: Wed, 29 Dec 2021 00:45:23 +0100	[thread overview]
Message-ID: <20211228234524.633509-2-leon@is.currently.online> (raw)
In-Reply-To: <20211228234524.633509-1-leon@is.currently.online>

From: Leon Schuermann <leon@is.currently.online>

Add an optional function `ndo_lookup_mtu` to the `struct
net_device_ops`. This function can be used to allow other parts of the
network stack to let the destination netdevice determine the allowed
packet MTU. This is done on a per-packet basis, providing the `struct
sk_buff` holding the packet contents.

The information obtained through this method may be cached by other
parts of the network stack, such as for instance the path MTU
discovery (PMTUD) mechanism. It is not guaranteed that this function
will be called for every packet, not even that is called on a single
packet of a given flow. When this function is not implemented or when
it returns -ENODATA no statement about the permitted MTU is made and
the networking stack will resort to the device MTU values. These
properties make this mechanism capable of providing a "suggestion" for
a packet's MTU, deviating from the default device MTU.

The device is allowed to announce MTU values lower or higher than the
minimum and maximum device MTU respectively. Whether such MTU values
will be respected is up to the implementation.

Still, even with this being a non-mandatory to implement or respect
mechanism, it has some interesting consequences. Being able to inspect
the entire packet buffer, the destination netdevice implementation can
control MTUs on a flow granularity. For instance, it could be used to
allow two devices on a shared Ethernet segment to communicate with
each other using a large (> 1500 byte) MTU, while using a lower MTU
for other devices.

The immediate motivation for these changes provide another example of
this mechanism being useful: when using WireGuard, peers can reside
behind paths of varying MTU restrictions. PMTUD does not work across
these tunnel links however, as WireGuard cannot accept unauthenticated
ICMP responses. Thus it will continue to send too large packets over
lower-MTU links. With this mechanism WireGuard can, on a per-peer
granularity, reduce the MTU, without limiting the overall device
MTU. Furthermore, it can employ in-band PMTUD mechanisms to resolve
these values automatically. While an MTU metric can be set for
specific FIB routes and thus lower the MTU for individual peers, as a
consequence this completely disables PMTUD on the entire route. While
regular PMTUD does not work over the tunnel link, it should still be
usable on the rest of the route. Furthermore, when employing an
in-band per-peer PMTUD mechanism, modifying the FIB to store the
detected MTU is inelegant at best.

Signed-off-by: Leon Schuermann <leon@is.currently.online>
---
 include/linux/netdevice.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7c3da0e1ea9d..d9d59b756f57 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1279,6 +1279,16 @@ struct netdev_net_notifier {
  * struct net_device *(*ndo_get_peer_dev)(struct net_device *dev);
  *	If a device is paired with a peer device, return the peer instance.
  *	The caller must be under RCU read context.
+ * int (*ndo_lookup_mtu)(const struct sk_buff *skb,
+ *			 const struct net_device *dev);
+ *	For devices supporting dynamic lookup of the MTU for individual
+ *	skb packets, this function returns the MTU for the passed skb.
+ *	A return value of -ENODATA must be treated as if the device does
+ *	not support this feature. It is not guaranteed that this function will
+ *	be called for every packet presented to the ndo_start_xmit function.
+ *	A device must always accept packets of the announced min/max device MTU.
+ *	This function may be used to potentially allow MTU sizes lower/higher
+ *	than the min/max device MTU respectively.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1487,6 +1497,8 @@ struct net_device_ops {
 	int			(*ndo_tunnel_ctl)(struct net_device *dev,
 						  struct ip_tunnel_parm *p, int cmd);
 	struct net_device *	(*ndo_get_peer_dev)(struct net_device *dev);
+	int			(*ndo_lookup_mtu)(const struct sk_buff *skb,
+						  const struct net_device *dev);
 };
 
 /**
-- 
2.33.1


  reply	other threads:[~2022-01-04 18:24 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-28 23:45 [RFC PATCH 0/4] Introduce per-peer MTU setting leon
2021-12-28 23:45 ` leon [this message]
2021-12-28 23:45 ` [RFC PATCH 2/4] net/ipv4: respect MTU determined by `ndo_lookup_mtu` leon
2021-12-28 23:45 ` [RFC PATCH 3/4] net/ipv6: " leon
2021-12-28 23:45 ` [RFC PATCH 4/4] net/wireguard: add per-peer MTU setting leon
2022-01-04 21:34 ` [RFC PATCH 0/4] Introduce " Toke Høiland-Jørgensen
2022-01-07 22:13   ` Leon Schuermann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211228234524.633509-2-leon@is.currently.online \
    --to=leon@is.currently.online \
    --cc=wireguard@lists.zx2c4.com \
    --subject='Re: [RFC PATCH 1/4] netdevice: add ndo_lookup_mtu for dynamically determining MTU' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).