passing-through TOS/DSCP marking

Development discussion of WireGuard
 help / color / mirror / Atom feed

* passing-through TOS/DSCP marking
@ 2021-06-16 13:24 Daniel Golle
  2021-06-16 16:28 ` Jason A. Donenfeld
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Golle @ 2021-06-16 13:24 UTC (permalink / raw)
  To: wireguard

Hi everyone,

First of all, I'm sorry if this topic has been dealt with before and I've
just not been able to find the answer in the archives.

I'm currently helping to setup a bunch of Wireguard links for a
non-profit humanitarian organization (using OpenWrt, ie. using the Linux
implementation on both ends). As these tunnels are carrying a diverse mix
of data, including VoIP, I was wondering if Wireguard passes-through the
TOS/DSCP marking of tunneled packages to the outer package's headers,
similar to what the 'passtos' option of OpenVPN is doing.

I would assume this is true by default, but I haven't found any
information about it, and as VoIP performance under load isn't as great
as I was expecting, I'm now writing to you here, hoping for a meaningful
answer which will preserve me from having to read the actual code...

Thank you for all the great work!

Best regards

Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-06-16 13:24 passing-through TOS/DSCP marking Daniel Golle
@ 2021-06-16 16:28 ` Jason A. Donenfeld
  2021-06-16 19:26   ` Daniel Golle
  2021-07-06  7:00   ` Florent Daigniere
  0 siblings, 2 replies; 22+ messages in thread
From: Jason A. Donenfeld @ 2021-06-16 16:28 UTC (permalink / raw)
  To: daniel; +Cc: WireGuard mailing list

WireGuard does not copy the inner DSCP mark to the outside, aside from
the ECN bits, in order to avoid a data leak.

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-06-16 16:28 ` Jason A. Donenfeld
@ 2021-06-16 19:26   ` Daniel Golle
  2021-06-16 23:33     ` Toke Høiland-Jørgensen
  2021-07-06  7:00   ` Florent Daigniere
  1 sibling, 1 reply; 22+ messages in thread
From: Daniel Golle @ 2021-06-16 19:26 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

Hi Jason,

On Wed, Jun 16, 2021 at 06:28:12PM +0200, Jason A. Donenfeld wrote:
> WireGuard does not copy the inner DSCP mark to the outside, aside from
> the ECN bits, in order to avoid a data leak.

That's a very valid argument.

However, from my experience now, Wireguard is not suitable for VoIP/RTP
data (minimize-delay) being sent through the same tunnel as TCP bulk
(maximize-throughput) traffic in bandwidth constraint and/or high-latency
environments, as that ruins the VoIP calls to the degree of not being
understandable. ECN helps quite a bit when it comes to avoid packet drops
for TCP traffic, but that's not enough to avoid high jitter and drops for
RTP/UDP traffic at the same time.

I thought about ways to improve that and wonder what you would suggest.
My ideas are:
 * have different tunnels depending on inner DSCP bits and mark them
   accordingly on the outside.
   => we already got multiple tunnels and that would double the number.

 * mark outer packets with DSCP bits based on their size.
   VoIP RTP/UDP packets are typically "medium sized" while TCP packets
   typically max out the MTU.
   => we would not leak information, but that assumption may not always
      be true

 * patch wireguard kernel code to allow preserving inner DSCP bits.
   => even only having 2 differentl classes of traffic (critical vs.
      bulk) would already help a lot...

What do you think? Any other ideas?

Cheers

Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-06-16 19:26   ` Daniel Golle
@ 2021-06-16 23:33     ` Toke Høiland-Jørgensen
  2021-06-17  7:55       ` Florent Daigniere
  0 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-16 23:33 UTC (permalink / raw)
  To: Daniel Golle, Jason A. Donenfeld; +Cc: WireGuard mailing list

Daniel Golle <daniel@makrotopia.org> writes:

> Hi Jason,
>
> On Wed, Jun 16, 2021 at 06:28:12PM +0200, Jason A. Donenfeld wrote:
>> WireGuard does not copy the inner DSCP mark to the outside, aside from
>> the ECN bits, in order to avoid a data leak.
>
> That's a very valid argument.
>
> However, from my experience now, Wireguard is not suitable for VoIP/RTP
> data (minimize-delay) being sent through the same tunnel as TCP bulk
> (maximize-throughput) traffic in bandwidth constraint and/or high-latency
> environments, as that ruins the VoIP calls to the degree of not being
> understandable. ECN helps quite a bit when it comes to avoid packet drops
> for TCP traffic, but that's not enough to avoid high jitter and drops for
> RTP/UDP traffic at the same time.
>
> I thought about ways to improve that and wonder what you would suggest.
> My ideas are:
>  * have different tunnels depending on inner DSCP bits and mark them
>    accordingly on the outside.
>    => we already got multiple tunnels and that would double the number.
>
>  * mark outer packets with DSCP bits based on their size.
>    VoIP RTP/UDP packets are typically "medium sized" while TCP packets
>    typically max out the MTU.
>    => we would not leak information, but that assumption may not always
>       be true
>
>  * patch wireguard kernel code to allow preserving inner DSCP bits.
>    => even only having 2 differentl classes of traffic (critical vs.
>       bulk) would already help a lot...
>
>
> What do you think? Any other ideas?

Can you share a few more details about the network setup? I.e., where is
the bottleneck link that requires this special treatment? If the
bottleneck router is the same one that does the wireguard encapsulation
(e.g., a home router with a slow uplink), you should be able to just use
flow queueing (fq_codel or sch_cake) in place of your diffserv-based
prioritisation and get most of the benefit: wireguard will save the
packet hash before encapsulation, so any qdisc running on the same box
can actually distinguish flows even on the encapsulated packets...

-Toke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-06-16 23:33     ` Toke Høiland-Jørgensen
@ 2021-06-17  7:55       ` Florent Daigniere
  2021-06-17  9:41         ` Daniel Golle
  0 siblings, 1 reply; 22+ messages in thread
From: Florent Daigniere @ 2021-06-17  7:55 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Daniel Golle, Jason A. Donenfeld
  Cc: WireGuard mailing list

On Thu, 2021-06-17 at 01:33 +0200, Toke Høiland-Jørgensen wrote:
> Daniel Golle <daniel@makrotopia.org> writes:
> 
> > Hi Jason,
> > 
> > On Wed, Jun 16, 2021 at 06:28:12PM +0200, Jason A. Donenfeld wrote:
> > > WireGuard does not copy the inner DSCP mark to the outside, aside
> > > from
> > > the ECN bits, in order to avoid a data leak.
> > 
> > That's a very valid argument.
> > 
> > However, from my experience now, Wireguard is not suitable for
> > VoIP/RTP
> > data (minimize-delay) being sent through the same tunnel as TCP bulk
> > (maximize-throughput) traffic in bandwidth constraint and/or high-
> > latency
> > environments, as that ruins the VoIP calls to the degree of not
> > being
> > understandable. ECN helps quite a bit when it comes to avoid packet
> > drops
> > for TCP traffic, but that's not enough to avoid high jitter and
> > drops for
> > RTP/UDP traffic at the same time.
> > 
> > I thought about ways to improve that and wonder what you would
> > suggest.
> > My ideas are:
> >  * have different tunnels depending on inner DSCP bits and mark them
> >    accordingly on the outside.
> >    => we already got multiple tunnels and that would double the
> > number.
> > 
> >  * mark outer packets with DSCP bits based on their size.
> >    VoIP RTP/UDP packets are typically "medium sized" while TCP
> > packets
> >    typically max out the MTU.
> >    => we would not leak information, but that assumption may not
> > always
> >       be true
> > 
> >  * patch wireguard kernel code to allow preserving inner DSCP bits.
> >    => even only having 2 differentl classes of traffic (critical vs.
> >       bulk) would already help a lot...
> > 
> > 
> > What do you think? Any other ideas?
> 
> Can you share a few more details about the network setup? I.e., where
> is
> the bottleneck link that requires this special treatment?

I can tell you about mine. WiFi in a congested environment: "voip on
mobile phones". WMM/802.11e uses the diffserv markings; most commercial
APs will do the right thing provided packets are marked appropriately.

At the time I have sent patches (back in 2019) for both the golang and
linux implementation that turned it on by default. I believe that
Russell Strong further improved upon them by adding a knob (20190318 on
this mailing list).

Earlier this month I was approached by a NGO that was trying to do voip
over satlinks in between ships... there too, any solution has to involve
DSCP markings.

Florent



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-06-17  7:55       ` Florent Daigniere
@ 2021-06-17  9:41         ` Daniel Golle
  2021-06-17 12:24           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Golle @ 2021-06-17  9:41 UTC (permalink / raw)
  To: Florent Daigniere
  Cc: Toke Høiland-Jørgensen, Jason A. Donenfeld,
	WireGuard mailing list

Hi Florent,

On Thu, Jun 17, 2021 at 07:55:09AM +0000, Florent Daigniere wrote:
> On Thu, 2021-06-17 at 01:33 +0200, Toke Høiland-Jørgensen wrote:
> > Daniel Golle <daniel@makrotopia.org> writes:
> > 
> > > Hi Jason,
> > > 
> > > On Wed, Jun 16, 2021 at 06:28:12PM +0200, Jason A. Donenfeld wrote:
> > > > WireGuard does not copy the inner DSCP mark to the outside, aside
> > > > from
> > > > the ECN bits, in order to avoid a data leak.
> > > 
> > > That's a very valid argument.
> > > 
> > > However, from my experience now, Wireguard is not suitable for
> > > VoIP/RTP
> > > data (minimize-delay) being sent through the same tunnel as TCP bulk
> > > (maximize-throughput) traffic in bandwidth constraint and/or high-
> > > latency
> > > environments, as that ruins the VoIP calls to the degree of not
> > > being
> > > understandable. ECN helps quite a bit when it comes to avoid packet
> > > drops
> > > for TCP traffic, but that's not enough to avoid high jitter and
> > > drops for
> > > RTP/UDP traffic at the same time.
> > > 
> > > I thought about ways to improve that and wonder what you would
> > > suggest.
> > > My ideas are:
> > >  * have different tunnels depending on inner DSCP bits and mark them
> > >    accordingly on the outside.
> > >    => we already got multiple tunnels and that would double the
> > > number.
> > > 
> > >  * mark outer packets with DSCP bits based on their size.
> > >    VoIP RTP/UDP packets are typically "medium sized" while TCP
> > > packets
> > >    typically max out the MTU.
> > >    => we would not leak information, but that assumption may not
> > > always
> > >       be true
> > > 
> > >  * patch wireguard kernel code to allow preserving inner DSCP bits.
> > >    => even only having 2 differentl classes of traffic (critical vs.
> > >       bulk) would already help a lot...
> > > 
> > > 
> > > What do you think? Any other ideas?
> > 
> > Can you share a few more details about the network setup? I.e., where
> > is
> > the bottleneck link that requires this special treatment?
> 
> I can tell you about mine. WiFi in a congested environment: "voip on
> mobile phones". WMM/802.11e uses the diffserv markings; most commercial
> APs will do the right thing provided packets are marked appropriately.
> 
> At the time I have sent patches (back in 2019) for both the golang and
> linux implementation that turned it on by default. I believe that
> Russell Strong further improved upon them by adding a knob (20190318 on
> this mailing list).

Thank you very much for the hint!
This patch is exactly what I was looking for:
https://lists.zx2c4.com/pipermail/wireguard/2019-March/004026.html

Unfortunately it has not received a great amount of feedback back then.
I'll try forward-porting and deploying it now, because to me it looks
like the best solution money can buy :)


> 
> Earlier this month I was approached by a NGO that was trying to do voip
> over satlinks in between ships... there too, any solution has to involve
> DSCP markings.

This is quite exactly the scenario I'm working on :)

Also here, we got multiple uplinks (2x VSAT, 1x 3G/4G/5G, 1x auxilary,
e.g. WiFi client during shipyard times). All those uplinks are often
congested and we can't do anything about their buffer bloat and as all of
them rely on a shared medium, the total amount of available bandwidth
varies quite a bit, which makes it hard to setup meaningful shaping. I
did setup cake qdisc on the upstream links with generously estimated
bandwidth settings, and that did not help a lot.
However, all of those proprietary black-boxes do seem to care about the
DSCP markings and that seems to be the way forward.


Thank you!


Daniel


> 
> Florent
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-06-17  9:41         ` Daniel Golle
@ 2021-06-17 12:24           ` Toke Høiland-Jørgensen
       [not found]             ` <CAMaqUZ09KRtp01OK3u-Di52X_kH9eT4E-wmnPc6QzjSCd5dEiw@mail.gmail.com>
  2021-06-17 23:04             ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-17 12:24 UTC (permalink / raw)
  To: Daniel Golle, Florent Daigniere
  Cc: Jason A. Donenfeld, WireGuard mailing list

Daniel Golle <daniel@makrotopia.org> writes:

> Hi Florent,
>
> On Thu, Jun 17, 2021 at 07:55:09AM +0000, Florent Daigniere wrote:
>> On Thu, 2021-06-17 at 01:33 +0200, Toke Høiland-Jørgensen wrote:
>> > Daniel Golle <daniel@makrotopia.org> writes:
>> > 
>> > > Hi Jason,
>> > > 
>> > > On Wed, Jun 16, 2021 at 06:28:12PM +0200, Jason A. Donenfeld wrote:
>> > > > WireGuard does not copy the inner DSCP mark to the outside, aside
>> > > > from
>> > > > the ECN bits, in order to avoid a data leak.
>> > > 
>> > > That's a very valid argument.
>> > > 
>> > > However, from my experience now, Wireguard is not suitable for
>> > > VoIP/RTP
>> > > data (minimize-delay) being sent through the same tunnel as TCP bulk
>> > > (maximize-throughput) traffic in bandwidth constraint and/or high-
>> > > latency
>> > > environments, as that ruins the VoIP calls to the degree of not
>> > > being
>> > > understandable. ECN helps quite a bit when it comes to avoid packet
>> > > drops
>> > > for TCP traffic, but that's not enough to avoid high jitter and
>> > > drops for
>> > > RTP/UDP traffic at the same time.
>> > > 
>> > > I thought about ways to improve that and wonder what you would
>> > > suggest.
>> > > My ideas are:
>> > >  * have different tunnels depending on inner DSCP bits and mark them
>> > >    accordingly on the outside.
>> > >    => we already got multiple tunnels and that would double the
>> > > number.
>> > > 
>> > >  * mark outer packets with DSCP bits based on their size.
>> > >    VoIP RTP/UDP packets are typically "medium sized" while TCP
>> > > packets
>> > >    typically max out the MTU.
>> > >    => we would not leak information, but that assumption may not
>> > > always
>> > >       be true
>> > > 
>> > >  * patch wireguard kernel code to allow preserving inner DSCP bits.
>> > >    => even only having 2 differentl classes of traffic (critical vs.
>> > >       bulk) would already help a lot...
>> > > 
>> > > 
>> > > What do you think? Any other ideas?
>> > 
>> > Can you share a few more details about the network setup? I.e., where
>> > is
>> > the bottleneck link that requires this special treatment?
>> 
>> I can tell you about mine. WiFi in a congested environment: "voip on
>> mobile phones". WMM/802.11e uses the diffserv markings; most commercial
>> APs will do the right thing provided packets are marked appropriately.
>> 
>> At the time I have sent patches (back in 2019) for both the golang and
>> linux implementation that turned it on by default. I believe that
>> Russell Strong further improved upon them by adding a knob (20190318 on
>> this mailing list).
>
> Thank you very much for the hint!
> This patch is exactly what I was looking for:
> https://lists.zx2c4.com/pipermail/wireguard/2019-March/004026.html
>
> Unfortunately it has not received a great amount of feedback back then.
> I'll try forward-porting and deploying it now, because to me it looks
> like the best solution money can buy :)

I think you can achieve something similar using BPF filters, by relying
on wireguard passing through the skb->hash value when encrypting.

Simply attach a TC-BPF filter to the wireguard netdev, pull out the DSCP
value and store it in a map keyed on skb->hash. Then, run a second BPF
filter on the physical interface that shares that same map, lookup the
DSCP value based on the skb->hash value, and rewrite the outer IP
header.

The read-side filter will need to use bpf_get_hash_recalc() to make sure
the hash is calculated before the packet gets handed to wireguard, and
it'll be subject to hash collisions, but I think it should generally
work fairly well (for anything that's flow-based of course). And it can
be done without patching wireguard itself :)

-Toke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
       [not found]             ` <CAMaqUZ09KRtp01OK3u-Di52X_kH9eT4E-wmnPc6QzjSCd5dEiw@mail.gmail.com>
@ 2021-06-17 20:54               ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-17 20:54 UTC (permalink / raw)
  To: Reid Rankin
  Cc: Daniel Golle, Florent Daigniere, Jason A. Donenfeld,
	WireGuard mailing list

Reid Rankin <reidrankin@gmail.com> writes:

> It can also be done in a shell script with nftables (maybe iptables too,
> haven't tried) by taking advantage of fwmark passthrough. You can have one
> rule that matches incoming outgoing packets (heh) with a certain dscp value
> and marks them, and another rule that matches outgoing outgoing packets
> with that mark and sets the DSCP bits back.

The fwmark is not passed through wireguard, though, it's cleared during
skb scrubbing:

https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L5344

There's an fwmark config that you can set which will make wireguard
apply a certain mark to all outgoing packets, but that has nothing to
do with what was set on the inner packet...

-Toke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-06-17 12:24           ` Toke Høiland-Jørgensen
       [not found]             ` <CAMaqUZ09KRtp01OK3u-Di52X_kH9eT4E-wmnPc6QzjSCd5dEiw@mail.gmail.com>
@ 2021-06-17 23:04             ` Toke Høiland-Jørgensen
  2021-06-18 12:24               ` Jason A. Donenfeld
  1 sibling, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-17 23:04 UTC (permalink / raw)
  To: Daniel Golle, Florent Daigniere
  Cc: Jason A. Donenfeld, WireGuard mailing list

Toke Høiland-Jørgensen <toke@toke.dk> writes:

> Daniel Golle <daniel@makrotopia.org> writes:
>
>> Hi Florent,
>>
>> On Thu, Jun 17, 2021 at 07:55:09AM +0000, Florent Daigniere wrote:
>>> On Thu, 2021-06-17 at 01:33 +0200, Toke Høiland-Jørgensen wrote:
>>> > Daniel Golle <daniel@makrotopia.org> writes:
>>> > 
>>> > > Hi Jason,
>>> > > 
>>> > > On Wed, Jun 16, 2021 at 06:28:12PM +0200, Jason A. Donenfeld wrote:
>>> > > > WireGuard does not copy the inner DSCP mark to the outside, aside
>>> > > > from
>>> > > > the ECN bits, in order to avoid a data leak.
>>> > > 
>>> > > That's a very valid argument.
>>> > > 
>>> > > However, from my experience now, Wireguard is not suitable for
>>> > > VoIP/RTP
>>> > > data (minimize-delay) being sent through the same tunnel as TCP bulk
>>> > > (maximize-throughput) traffic in bandwidth constraint and/or high-
>>> > > latency
>>> > > environments, as that ruins the VoIP calls to the degree of not
>>> > > being
>>> > > understandable. ECN helps quite a bit when it comes to avoid packet
>>> > > drops
>>> > > for TCP traffic, but that's not enough to avoid high jitter and
>>> > > drops for
>>> > > RTP/UDP traffic at the same time.
>>> > > 
>>> > > I thought about ways to improve that and wonder what you would
>>> > > suggest.
>>> > > My ideas are:
>>> > >  * have different tunnels depending on inner DSCP bits and mark them
>>> > >    accordingly on the outside.
>>> > >    => we already got multiple tunnels and that would double the
>>> > > number.
>>> > > 
>>> > >  * mark outer packets with DSCP bits based on their size.
>>> > >    VoIP RTP/UDP packets are typically "medium sized" while TCP
>>> > > packets
>>> > >    typically max out the MTU.
>>> > >    => we would not leak information, but that assumption may not
>>> > > always
>>> > >       be true
>>> > > 
>>> > >  * patch wireguard kernel code to allow preserving inner DSCP bits.
>>> > >    => even only having 2 differentl classes of traffic (critical vs.
>>> > >       bulk) would already help a lot...
>>> > > 
>>> > > 
>>> > > What do you think? Any other ideas?
>>> > 
>>> > Can you share a few more details about the network setup? I.e., where
>>> > is
>>> > the bottleneck link that requires this special treatment?
>>> 
>>> I can tell you about mine. WiFi in a congested environment: "voip on
>>> mobile phones". WMM/802.11e uses the diffserv markings; most commercial
>>> APs will do the right thing provided packets are marked appropriately.
>>> 
>>> At the time I have sent patches (back in 2019) for both the golang and
>>> linux implementation that turned it on by default. I believe that
>>> Russell Strong further improved upon them by adding a knob (20190318 on
>>> this mailing list).
>>
>> Thank you very much for the hint!
>> This patch is exactly what I was looking for:
>> https://lists.zx2c4.com/pipermail/wireguard/2019-March/004026.html
>>
>> Unfortunately it has not received a great amount of feedback back then.
>> I'll try forward-porting and deploying it now, because to me it looks
>> like the best solution money can buy :)
>
> I think you can achieve something similar using BPF filters, by relying
> on wireguard passing through the skb->hash value when encrypting.
>
> Simply attach a TC-BPF filter to the wireguard netdev, pull out the DSCP
> value and store it in a map keyed on skb->hash. Then, run a second BPF
> filter on the physical interface that shares that same map, lookup the
> DSCP value based on the skb->hash value, and rewrite the outer IP
> header.
>
> The read-side filter will need to use bpf_get_hash_recalc() to make sure
> the hash is calculated before the packet gets handed to wireguard, and
> it'll be subject to hash collisions, but I think it should generally
> work fairly well (for anything that's flow-based of course). And it can
> be done without patching wireguard itself :)

Just for fun I implemented such a pair of eBPF filters, and tested that
it does indeed work for preserving DSCP marks on a Wireguard tunnel. The
PoC is here:

https://github.com/xdp-project/bpf-examples/tree/master/preserve-dscp

To try it out (you'll need a recent-ish kernel and clang version) run:

git clone --recurse-submodules https://github.com/xdp-project/bpf-examples
cd bpf-examples/preserve-dscp
make
./preserve-dscp wg0 eth0

(assuming wg0 and eth0 are the wireguard and physical interfaces in
question, respectively).

To actually deploy this it would probably need a few tweaks; in
particular the second filter that rewrites packets should probably check
that the packets are actually part of the Wireguard tunnel in question
(by parsing the UDP header and checking the source port) before writing
anything to the packet.

-Toke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-06-17 23:04             ` Toke Høiland-Jørgensen
@ 2021-06-18 12:24               ` Jason A. Donenfeld
  2021-06-21 12:36                 ` Daniel Golle
  0 siblings, 1 reply; 22+ messages in thread
From: Jason A. Donenfeld @ 2021-06-18 12:24 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Golle, Florent Daigniere, WireGuard mailing list

Hey Toke,

On Fri, Jun 18, 2021 at 1:05 AM Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> > I think you can achieve something similar using BPF filters, by relying
> > on wireguard passing through the skb->hash value when encrypting.
> >
> > Simply attach a TC-BPF filter to the wireguard netdev, pull out the DSCP
> > value and store it in a map keyed on skb->hash. Then, run a second BPF
> > filter on the physical interface that shares that same map, lookup the
> > DSCP value based on the skb->hash value, and rewrite the outer IP
> > header.
> >
> > The read-side filter will need to use bpf_get_hash_recalc() to make sure
> > the hash is calculated before the packet gets handed to wireguard, and
> > it'll be subject to hash collisions, but I think it should generally
> > work fairly well (for anything that's flow-based of course). And it can
> > be done without patching wireguard itself :)
>
> Just for fun I implemented such a pair of eBPF filters, and tested that
> it does indeed work for preserving DSCP marks on a Wireguard tunnel. The
> PoC is here:
>
> https://github.com/xdp-project/bpf-examples/tree/master/preserve-dscp
>
> To try it out (you'll need a recent-ish kernel and clang version) run:
>
> git clone --recurse-submodules https://github.com/xdp-project/bpf-examples
> cd bpf-examples/preserve-dscp
> make
> ./preserve-dscp wg0 eth0
>
> (assuming wg0 and eth0 are the wireguard and physical interfaces in
> question, respectively).
>
> To actually deploy this it would probably need a few tweaks; in
> particular the second filter that rewrites packets should probably check
> that the packets are actually part of the Wireguard tunnel in question
> (by parsing the UDP header and checking the source port) before writing
> anything to the packet.
>
> -Toke

That is a super cool approach. Thanks for writing that! Sounds like a
good approach, and one pretty easy to deploy, without the need to
patch kernels and such.

Also, nice usage of BPF_MAP_TYPE_LRU_HASH for this.

Daniel -- can you let the list know if this works for your use case?

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-06-18 12:24               ` Jason A. Donenfeld
@ 2021-06-21 12:36                 ` Daniel Golle
  2021-06-21 14:27                   ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Golle @ 2021-06-21 12:36 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Toke Høiland-Jørgensen, Florent Daigniere,
	WireGuard mailing list

On Fri, Jun 18, 2021 at 02:24:29PM +0200, Jason A. Donenfeld wrote:
> Hey Toke,
> 
> On Fri, Jun 18, 2021 at 1:05 AM Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> > > I think you can achieve something similar using BPF filters, by relying
> > > on wireguard passing through the skb->hash value when encrypting.
> > >
> > > Simply attach a TC-BPF filter to the wireguard netdev, pull out the DSCP
> > > value and store it in a map keyed on skb->hash. Then, run a second BPF
> > > filter on the physical interface that shares that same map, lookup the
> > > DSCP value based on the skb->hash value, and rewrite the outer IP
> > > header.
> > >
> > > The read-side filter will need to use bpf_get_hash_recalc() to make sure
> > > the hash is calculated before the packet gets handed to wireguard, and
> > > it'll be subject to hash collisions, but I think it should generally
> > > work fairly well (for anything that's flow-based of course). And it can
> > > be done without patching wireguard itself :)
> >
> > Just for fun I implemented such a pair of eBPF filters, and tested that
> > it does indeed work for preserving DSCP marks on a Wireguard tunnel. The
> > PoC is here:
> >
> > https://github.com/xdp-project/bpf-examples/tree/master/preserve-dscp
> >
> > To try it out (you'll need a recent-ish kernel and clang version) run:
> >
> > git clone --recurse-submodules https://github.com/xdp-project/bpf-examples
> > cd bpf-examples/preserve-dscp
> > make
> > ./preserve-dscp wg0 eth0
> >
> > (assuming wg0 and eth0 are the wireguard and physical interfaces in
> > question, respectively).
> >
> > To actually deploy this it would probably need a few tweaks; in
> > particular the second filter that rewrites packets should probably check
> > that the packets are actually part of the Wireguard tunnel in question
> > (by parsing the UDP header and checking the source port) before writing
> > anything to the packet.
> >
> > -Toke
> 
> That is a super cool approach. Thanks for writing that! Sounds like a
> good approach, and one pretty easy to deploy, without the need to
> patch kernels and such.
> 
> Also, nice usage of BPF_MAP_TYPE_LRU_HASH for this.
> 
> Daniel -- can you let the list know if this works for your use case?

Turns out not exactly easy to deploy (on OpenWrt), as it depends on an
extremely recent environment. I will try pushing to that direction, but
it doesn't look like it's going to be ready very soon.

In terms of toolchain: LLVM/Clang is a very bulky beast, I gave up on
that and started working on integrating GCC-10's BPF target in our build
system...

In terms of kernel support: recent kernels don't build yet because of
gelf_getsymshndx, so we got to update libelf first for that. Recent
libelf doesn't seem to be an option yet on many of the build hosts we
currently support (Darwin and such).

In terms of library support: our build of libbpf comes from Linux
release tarballs. There isn't yet a release supporting bpf_tc_attach,
the easiest would be to wait for Linux 5.13 to be released.

I (of course ;) also tried and spend almost a day looking for a
quick-and-dirty path for temporary deployment, so I could at least give
feedback -- bpf-examples also isn't exactly made to be cross-compiled
manually, so I have failed with that as well so far.

In general I still believe this is a great approach and I would really
like to use it as soon as possible, even just to give you feedback.

Cheers

Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-06-21 12:36                 ` Daniel Golle
@ 2021-06-21 14:27                   ` Toke Høiland-Jørgensen
  2021-06-30 17:23                     ` Daniel Golle
  0 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-21 14:27 UTC (permalink / raw)
  To: Daniel Golle, Jason A. Donenfeld
  Cc: Florent Daigniere, WireGuard mailing list

Daniel Golle <daniel@makrotopia.org> writes:

> On Fri, Jun 18, 2021 at 02:24:29PM +0200, Jason A. Donenfeld wrote:
>> Hey Toke,
>> 
>> On Fri, Jun 18, 2021 at 1:05 AM Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>> > > I think you can achieve something similar using BPF filters, by relying
>> > > on wireguard passing through the skb->hash value when encrypting.
>> > >
>> > > Simply attach a TC-BPF filter to the wireguard netdev, pull out the DSCP
>> > > value and store it in a map keyed on skb->hash. Then, run a second BPF
>> > > filter on the physical interface that shares that same map, lookup the
>> > > DSCP value based on the skb->hash value, and rewrite the outer IP
>> > > header.
>> > >
>> > > The read-side filter will need to use bpf_get_hash_recalc() to make sure
>> > > the hash is calculated before the packet gets handed to wireguard, and
>> > > it'll be subject to hash collisions, but I think it should generally
>> > > work fairly well (for anything that's flow-based of course). And it can
>> > > be done without patching wireguard itself :)
>> >
>> > Just for fun I implemented such a pair of eBPF filters, and tested that
>> > it does indeed work for preserving DSCP marks on a Wireguard tunnel. The
>> > PoC is here:
>> >
>> > https://github.com/xdp-project/bpf-examples/tree/master/preserve-dscp
>> >
>> > To try it out (you'll need a recent-ish kernel and clang version) run:
>> >
>> > git clone --recurse-submodules https://github.com/xdp-project/bpf-examples
>> > cd bpf-examples/preserve-dscp
>> > make
>> > ./preserve-dscp wg0 eth0
>> >
>> > (assuming wg0 and eth0 are the wireguard and physical interfaces in
>> > question, respectively).
>> >
>> > To actually deploy this it would probably need a few tweaks; in
>> > particular the second filter that rewrites packets should probably check
>> > that the packets are actually part of the Wireguard tunnel in question
>> > (by parsing the UDP header and checking the source port) before writing
>> > anything to the packet.
>> >
>> > -Toke
>> 
>> That is a super cool approach. Thanks for writing that! Sounds like a
>> good approach, and one pretty easy to deploy, without the need to
>> patch kernels and such.
>> 
>> Also, nice usage of BPF_MAP_TYPE_LRU_HASH for this.
>> 
>> Daniel -- can you let the list know if this works for your use case?
>
> Turns out not exactly easy to deploy (on OpenWrt), as it depends on an
> extremely recent environment. I will try pushing to that direction, but
> it doesn't look like it's going to be ready very soon.
>
> In terms of toolchain: LLVM/Clang is a very bulky beast, I gave up on
> that and started working on integrating GCC-10's BPF target in our build
> system...

I saw that, but I have no idea if GCC's BPF target support will support
this. My tentative guess would be no, unfortunately :(

An alternative to getting LLVM built as part of the OpenWrt toolchain is
to just use the host clang to build the BPF binaries. It doesn't
actually need to be cross-compiled with a special compiler, the BPF byte
code format is the same on all architectures except for endianness, so
just passing that to the host clang should theoretically be enough...

> In terms of kernel support: recent kernels don't build yet because of
> gelf_getsymshndx, so we got to update libelf first for that. Recent
> libelf doesn't seem to be an option yet on many of the build hosts we
> currently support (Darwin and such).
>
> In terms of library support: our build of libbpf comes from Linux
> release tarballs. There isn't yet a release supporting bpf_tc_attach,
> the easiest would be to wait for Linux 5.13 to be released.

I used the libbpf TC loading support for convenience, but it's possible
to load it using 'tc' as well without too much trouble (right now the
userspace component sets a config variable before loading the program,
but it can be restructured to not need that).

Alternatively, the bpf-examples repository is setup with a libbpf
submodule that it can link statically against, so you could use that for
now?

> I (of course ;) also tried and spend almost a day looking for a
> quick-and-dirty path for temporary deployment, so I could at least give
> feedback -- bpf-examples also isn't exactly made to be cross-compiled
> manually, so I have failed with that as well so far.

Heh, no, it isn't, really. Anything in particular you need to make this
easier? We already added some bits to xdp-tools for supporting
cross-compilation (and that shares some lineage with bpf-examples), so
porting those over should not be too difficult.

See: https://github.com/xdp-project/xdp-tools/pull/78 and
https://github.com/xdp-project/xdp-tools/issues/74

Unfortunately I don't have a lot of time to poke more at this right now,
but feel free to open up an issue / pull request to the bpf-examples
repository with any changes you need :)

-Toke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-06-21 14:27                   ` Toke Høiland-Jørgensen
@ 2021-06-30 17:23                     ` Daniel Golle
  2021-06-30 20:55                       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Golle @ 2021-06-30 17:23 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jason A. Donenfeld, Florent Daigniere, WireGuard mailing list

Hi Toke,

On Mon, Jun 21, 2021 at 04:27:08PM +0200, Toke Høiland-Jørgensen wrote:
> Daniel Golle <daniel@makrotopia.org> writes:
> 
> > On Fri, Jun 18, 2021 at 02:24:29PM +0200, Jason A. Donenfeld wrote:
> >> Hey Toke,
> >> 
> >> On Fri, Jun 18, 2021 at 1:05 AM Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> >> > > I think you can achieve something similar using BPF filters, by relying
> >> > > on wireguard passing through the skb->hash value when encrypting.
> >> > >
> >> > > Simply attach a TC-BPF filter to the wireguard netdev, pull out the DSCP
> >> > > value and store it in a map keyed on skb->hash. Then, run a second BPF
> >> > > filter on the physical interface that shares that same map, lookup the
> >> > > DSCP value based on the skb->hash value, and rewrite the outer IP
> >> > > header.
> >> > >
> >> > > The read-side filter will need to use bpf_get_hash_recalc() to make sure
> >> > > the hash is calculated before the packet gets handed to wireguard, and
> >> > > it'll be subject to hash collisions, but I think it should generally
> >> > > work fairly well (for anything that's flow-based of course). And it can
> >> > > be done without patching wireguard itself :)
> >> >
> >> > Just for fun I implemented such a pair of eBPF filters, and tested that
> >> > it does indeed work for preserving DSCP marks on a Wireguard tunnel. The
> >> > PoC is here:
> >> >
> >> > https://github.com/xdp-project/bpf-examples/tree/master/preserve-dscp
> >> >
> >> > To try it out (you'll need a recent-ish kernel and clang version) run:
> >> >
> >> > git clone --recurse-submodules https://github.com/xdp-project/bpf-examples
> >> > cd bpf-examples/preserve-dscp
> >> > make
> >> > ./preserve-dscp wg0 eth0
> >> >
> >> > (assuming wg0 and eth0 are the wireguard and physical interfaces in
> >> > question, respectively).
> >> >
> >> > To actually deploy this it would probably need a few tweaks; in
> >> > particular the second filter that rewrites packets should probably check
> >> > that the packets are actually part of the Wireguard tunnel in question
> >> > (by parsing the UDP header and checking the source port) before writing
> >> > anything to the packet.
> >> >
> >> > -Toke
> >> 
> >> That is a super cool approach. Thanks for writing that! Sounds like a
> >> good approach, and one pretty easy to deploy, without the need to
> >> patch kernels and such.
> >> 
> >> Also, nice usage of BPF_MAP_TYPE_LRU_HASH for this.
> >> 
> >> Daniel -- can you let the list know if this works for your use case?
> >
> > Turns out not exactly easy to deploy (on OpenWrt), as it depends on an
> > extremely recent environment. I will try pushing to that direction, but
> > it doesn't look like it's going to be ready very soon.
> >
> > In terms of toolchain: LLVM/Clang is a very bulky beast, I gave up on
> > that and started working on integrating GCC-10's BPF target in our build
> > system...
> 
> I saw that, but I have no idea if GCC's BPF target support will support
> this. My tentative guess would be no, unfortunately :(

Probably you are right. When building the BPF object with GCC, the
result is:
root@OpenWrt:/usr/lib/bpf# preserve-dscp wg0 eth0
libbpf: elf: skipping unrecognized data section(4) .stab
libbpf: elf: skipping relo section(5) .rel.stab for section(4) .stab
libbpf: elf: skipping unrecognized data section(13) .comment
libbpf: BTF is required, but is missing or corrupted.
Couldn't open file: preserve_dscp_kern.o

Using the LLVM/Clang compiled object also doesn't work:
root@OpenWrt:/usr/lib/bpf# preserve-dscp wg0 eth0
libbpf: Error in bpf_create_map_xattr(flow_dscps):Operation not permitted(-1). Retrying without BTF.
libbpf: map 'flow_dscps': failed to create: Operation not permitted(-1)
libbpf: permission error while running as root; try raising 'ulimit -l'? current value: 512.0 KiB
libbpf: failed to load object 'preserve_dscp_kern.o'
Failed to load object

Probably Kernel 5.4.124 is too old...?


> 
> An alternative to getting LLVM built as part of the OpenWrt toolchain is
> to just use the host clang to build the BPF binaries. It doesn't
> actually need to be cross-compiled with a special compiler, the BPF byte
> code format is the same on all architectures except for endianness, so
> just passing that to the host clang should theoretically be enough...

I believe that having a way to build BPF objects compatible with the
target built-into our toolchain would be a huge step forward.
And given that gcc already get's pretty far, I think it'd be worth
fixing/patching what ever is missing (I haven't even tried GCC-11 yet)

Find my staging tree including 'preserve-dscp' ready to play with:

https://git.openwrt.org/?p=openwrt/staging/dangole.git;a=shortlog;h=refs/heads/gcc10-bpf

Select 'Enable experimental features by default', but note that toolchain
doesn't build when selecting Linux 5.10 for x86, so you need to un-select
'Use testing Kernel' if building for x86.
And have a look at the patch for allow building bpf-examples BPF objects
with GCC in package/network/utils/bpf-examples/patches


> 
> > In terms of kernel support: recent kernels don't build yet because of
> > gelf_getsymshndx, so we got to update libelf first for that. Recent
> > libelf doesn't seem to be an option yet on many of the build hosts we
> > currently support (Darwin and such).
> >
> > In terms of library support: our build of libbpf comes from Linux
> > release tarballs. There isn't yet a release supporting bpf_tc_attach,
> > the easiest would be to wait for Linux 5.13 to be released.
> 
> I used the libbpf TC loading support for convenience, but it's possible
> to load it using 'tc' as well without too much trouble (right now the
> userspace component sets a config variable before loading the program,
> but it can be restructured to not need that).
> 
> Alternatively, the bpf-examples repository is setup with a libbpf
> submodule that it can link statically against, so you could use that for
> now?

I've updated to 5.13 + patches on top, so now it builds :)
Library-embedding is a no-go for OpenWrt. Having different ABI-versions
of libraries installed simultanously works, so we can just ship with
a more recent version of libbpf.

> 
> > I (of course ;) also tried and spend almost a day looking for a
> > quick-and-dirty path for temporary deployment, so I could at least give
> > feedback -- bpf-examples also isn't exactly made to be cross-compiled
> > manually, so I have failed with that as well so far.
> 
> Heh, no, it isn't, really. Anything in particular you need to make this
> easier? We already added some bits to xdp-tools for supporting
> cross-compilation (and that shares some lineage with bpf-examples), so
> porting those over should not be too difficult.

I found my way around, see the packaging for bpf-examples in the tree
(link above, at path stated above)

> 
> See: https://github.com/xdp-project/xdp-tools/pull/78 and
> https://github.com/xdp-project/xdp-tools/issues/74
> 
> Unfortunately I don't have a lot of time to poke more at this right now,
> but feel free to open up an issue / pull request to the bpf-examples
> repository with any changes you need :)

I guess I'll just go ahead then and package xdp-tools :)


Cheers


Daniel

> 
> -Toke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-06-30 17:23                     ` Daniel Golle
@ 2021-06-30 20:55                       ` Toke Høiland-Jørgensen
  2021-07-04 14:15                         ` Daniel Golle
  0 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-30 20:55 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Jason A. Donenfeld, Florent Daigniere, WireGuard mailing list

Daniel Golle <daniel@makrotopia.org> writes:

> Hi Toke,
>
> On Mon, Jun 21, 2021 at 04:27:08PM +0200, Toke Høiland-Jørgensen wrote:
>> Daniel Golle <daniel@makrotopia.org> writes:
>> 
>> > On Fri, Jun 18, 2021 at 02:24:29PM +0200, Jason A. Donenfeld wrote:
>> >> Hey Toke,
>> >> 
>> >> On Fri, Jun 18, 2021 at 1:05 AM Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>> >> > > I think you can achieve something similar using BPF filters, by relying
>> >> > > on wireguard passing through the skb->hash value when encrypting.
>> >> > >
>> >> > > Simply attach a TC-BPF filter to the wireguard netdev, pull out the DSCP
>> >> > > value and store it in a map keyed on skb->hash. Then, run a second BPF
>> >> > > filter on the physical interface that shares that same map, lookup the
>> >> > > DSCP value based on the skb->hash value, and rewrite the outer IP
>> >> > > header.
>> >> > >
>> >> > > The read-side filter will need to use bpf_get_hash_recalc() to make sure
>> >> > > the hash is calculated before the packet gets handed to wireguard, and
>> >> > > it'll be subject to hash collisions, but I think it should generally
>> >> > > work fairly well (for anything that's flow-based of course). And it can
>> >> > > be done without patching wireguard itself :)
>> >> >
>> >> > Just for fun I implemented such a pair of eBPF filters, and tested that
>> >> > it does indeed work for preserving DSCP marks on a Wireguard tunnel. The
>> >> > PoC is here:
>> >> >
>> >> > https://github.com/xdp-project/bpf-examples/tree/master/preserve-dscp
>> >> >
>> >> > To try it out (you'll need a recent-ish kernel and clang version) run:
>> >> >
>> >> > git clone --recurse-submodules https://github.com/xdp-project/bpf-examples
>> >> > cd bpf-examples/preserve-dscp
>> >> > make
>> >> > ./preserve-dscp wg0 eth0
>> >> >
>> >> > (assuming wg0 and eth0 are the wireguard and physical interfaces in
>> >> > question, respectively).
>> >> >
>> >> > To actually deploy this it would probably need a few tweaks; in
>> >> > particular the second filter that rewrites packets should probably check
>> >> > that the packets are actually part of the Wireguard tunnel in question
>> >> > (by parsing the UDP header and checking the source port) before writing
>> >> > anything to the packet.
>> >> >
>> >> > -Toke
>> >> 
>> >> That is a super cool approach. Thanks for writing that! Sounds like a
>> >> good approach, and one pretty easy to deploy, without the need to
>> >> patch kernels and such.
>> >> 
>> >> Also, nice usage of BPF_MAP_TYPE_LRU_HASH for this.
>> >> 
>> >> Daniel -- can you let the list know if this works for your use case?
>> >
>> > Turns out not exactly easy to deploy (on OpenWrt), as it depends on an
>> > extremely recent environment. I will try pushing to that direction, but
>> > it doesn't look like it's going to be ready very soon.
>> >
>> > In terms of toolchain: LLVM/Clang is a very bulky beast, I gave up on
>> > that and started working on integrating GCC-10's BPF target in our build
>> > system...
>> 
>> I saw that, but I have no idea if GCC's BPF target support will support
>> this. My tentative guess would be no, unfortunately :(
>
> Probably you are right. When building the BPF object with GCC, the
> result is:
> root@OpenWrt:/usr/lib/bpf# preserve-dscp wg0 eth0
> libbpf: elf: skipping unrecognized data section(4) .stab
> libbpf: elf: skipping relo section(5) .rel.stab for section(4) .stab
> libbpf: elf: skipping unrecognized data section(13) .comment
> libbpf: BTF is required, but is missing or corrupted.
> Couldn't open file: preserve_dscp_kern.o

Hmm, for this example it should be possible to make it run without BTF.
I'm only using that for the map definition, so that could be changed to
the old format; you could try this patch:

diff --git a/preserve-dscp/preserve_dscp_kern.c b/preserve-dscp/preserve_dscp_kern.c
index 24120cb8a3ff..08248e1f0e41 100644
--- a/preserve-dscp/preserve_dscp_kern.c
+++ b/preserve-dscp/preserve_dscp_kern.c
@@ -9,12 +9,12 @@
  * otherwise clean up stale entries. Instead, we just rely on the LRU mechanism
  * to evict old entries as the map fills up.
  */
-struct {
-       __uint(type, BPF_MAP_TYPE_LRU_HASH);
-       __type(key, __u32);
-       __type(value, __u8);
-       __uint(max_entries, 16384);
-} flow_dscps SEC(".maps");
+struct bpf_map_def SEC("maps") flow_dscps = {
+       .type           = BPF_MAP_TYPE_LRU_HASH,
+       .key_size       = sizeof(__u32),
+       .value_size     = sizeof(__u8),
+       .max_entries    = 16384,
+};
 
 const volatile static int ip_only = 0;

> Using the LLVM/Clang compiled object also doesn't work:
> root@OpenWrt:/usr/lib/bpf# preserve-dscp wg0 eth0
> libbpf: Error in bpf_create_map_xattr(flow_dscps):Operation not permitted(-1). Retrying without BTF.
> libbpf: map 'flow_dscps': failed to create: Operation not permitted(-1)
> libbpf: permission error while running as root; try raising 'ulimit -l'? current value: 512.0 KiB
> libbpf: failed to load object 'preserve_dscp_kern.o'
> Failed to load object
>
> Probably Kernel 5.4.124 is too old...?

Here I think the hint is in the error message ;)

>> An alternative to getting LLVM built as part of the OpenWrt toolchain is
>> to just use the host clang to build the BPF binaries. It doesn't
>> actually need to be cross-compiled with a special compiler, the BPF byte
>> code format is the same on all architectures except for endianness, so
>> just passing that to the host clang should theoretically be enough...
>
> I believe that having a way to build BPF objects compatible with the
> target built-into our toolchain would be a huge step forward.
> And given that gcc already get's pretty far, I think it'd be worth
> fixing/patching what ever is missing (I haven't even tried GCC-11 yet)

For this example that might work (as noted above), but for other things
BTF is a hard requirement, and I don't believe GCC supports that at all,
sadly :(

> Find my staging tree including 'preserve-dscp' ready to play with:
>
> https://git.openwrt.org/?p=openwrt/staging/dangole.git;a=shortlog;h=refs/heads/gcc10-bpf
>
> Select 'Enable experimental features by default', but note that toolchain
> doesn't build when selecting Linux 5.10 for x86, so you need to un-select
> 'Use testing Kernel' if building for x86.
> And have a look at the patch for allow building bpf-examples BPF objects
> with GCC in package/network/utils/bpf-examples/patches
>
>
>> 
>> > In terms of kernel support: recent kernels don't build yet because of
>> > gelf_getsymshndx, so we got to update libelf first for that. Recent
>> > libelf doesn't seem to be an option yet on many of the build hosts we
>> > currently support (Darwin and such).
>> >
>> > In terms of library support: our build of libbpf comes from Linux
>> > release tarballs. There isn't yet a release supporting bpf_tc_attach,
>> > the easiest would be to wait for Linux 5.13 to be released.
>> 
>> I used the libbpf TC loading support for convenience, but it's possible
>> to load it using 'tc' as well without too much trouble (right now the
>> userspace component sets a config variable before loading the program,
>> but it can be restructured to not need that).
>> 
>> Alternatively, the bpf-examples repository is setup with a libbpf
>> submodule that it can link statically against, so you could use that for
>> now?
>
> I've updated to 5.13 + patches on top, so now it builds :)

Alright, that works.

> Library-embedding is a no-go for OpenWrt. Having different ABI-versions
> of libraries installed simultanously works, so we can just ship with
> a more recent version of libbpf.

Yeah, I wasn't suggesting it as a permanent solution, just so you could
test it out :)

>> > I (of course ;) also tried and spend almost a day looking for a
>> > quick-and-dirty path for temporary deployment, so I could at least give
>> > feedback -- bpf-examples also isn't exactly made to be cross-compiled
>> > manually, so I have failed with that as well so far.
>> 
>> Heh, no, it isn't, really. Anything in particular you need to make this
>> easier? We already added some bits to xdp-tools for supporting
>> cross-compilation (and that shares some lineage with bpf-examples), so
>> porting those over should not be too difficult.
>
> I found my way around, see the packaging for bpf-examples in the tree
> (link above, at path stated above)

Right, I see. 

>> 
>> See: https://github.com/xdp-project/xdp-tools/pull/78 and
>> https://github.com/xdp-project/xdp-tools/issues/74
>> 
>> Unfortunately I don't have a lot of time to poke more at this right now,
>> but feel free to open up an issue / pull request to the bpf-examples
>> repository with any changes you need :)
>
> I guess I'll just go ahead then and package xdp-tools :)

That would be awesome! xdp-tools will definitely need BTF, though, so
I'm afraid it'll need to be compiled with LLVM at this stage...

-Toke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-06-30 20:55                       ` Toke Høiland-Jørgensen
@ 2021-07-04 14:15                         ` Daniel Golle
  2021-07-05 15:21                           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Golle @ 2021-07-04 14:15 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jason A. Donenfeld, Florent Daigniere, WireGuard mailing list

Hi Toke,

thank you for the ongoing efforts and support on this issue.

On Wed, Jun 30, 2021 at 10:55:09PM +0200, Toke Høiland-Jørgensen wrote:
> Daniel Golle <daniel@makrotopia.org> writes:
> > ...
> >> >
> >> > In terms of toolchain: LLVM/Clang is a very bulky beast, I gave up on
> >> > that and started working on integrating GCC-10's BPF target in our build
> >> > system...
> >> 
> >> I saw that, but I have no idea if GCC's BPF target support will support
> >> this. My tentative guess would be no, unfortunately :(
> >
> > Probably you are right. When building the BPF object with GCC, the
> > result is:
> > root@OpenWrt:/usr/lib/bpf# preserve-dscp wg0 eth0
> > libbpf: elf: skipping unrecognized data section(4) .stab
> > libbpf: elf: skipping relo section(5) .rel.stab for section(4) .stab
> > libbpf: elf: skipping unrecognized data section(13) .comment
> > libbpf: BTF is required, but is missing or corrupted.
> > Couldn't open file: preserve_dscp_kern.o
> 
> Hmm, for this example it should be possible to make it run without BTF.
> I'm only using that for the map definition, so that could be changed to
> the old format; you could try this patch:
> 
> diff --git a/preserve-dscp/preserve_dscp_kern.c b/preserve-dscp/preserve_dscp_kern.c
> index 24120cb8a3ff..08248e1f0e41 100644
> --- a/preserve-dscp/preserve_dscp_kern.c
> +++ b/preserve-dscp/preserve_dscp_kern.c
> @@ -9,12 +9,12 @@
>   * otherwise clean up stale entries. Instead, we just rely on the LRU mechanism
>   * to evict old entries as the map fills up.
>   */
> -struct {
> -       __uint(type, BPF_MAP_TYPE_LRU_HASH);
> -       __type(key, __u32);
> -       __type(value, __u8);
> -       __uint(max_entries, 16384);
> -} flow_dscps SEC(".maps");
> +struct bpf_map_def SEC("maps") flow_dscps = {
> +       .type           = BPF_MAP_TYPE_LRU_HASH,
> +       .key_size       = sizeof(__u32),
> +       .value_size     = sizeof(__u8),
> +       .max_entries    = 16384,
> +};
>  
>  const volatile static int ip_only = 0;
> 

That change gives me the next error:
libbpf: map '' (legacy): static maps are not supported

Also speaks for itself...

> > Using the LLVM/Clang compiled object also doesn't work:
> > root@OpenWrt:/usr/lib/bpf# preserve-dscp wg0 eth0
> > libbpf: Error in bpf_create_map_xattr(flow_dscps):Operation not permitted(-1). Retrying without BTF.
> > libbpf: map 'flow_dscps': failed to create: Operation not permitted(-1)
> > libbpf: permission error while running as root; try raising 'ulimit -l'? current value: 512.0 KiB
> > libbpf: failed to load object 'preserve_dscp_kern.o'
> > Failed to load object
> >
> > Probably Kernel 5.4.124 is too old...?
> 
> Here I think the hint is in the error message ;)

Yep, I realized I had to increase it to ulimit to 2048...
With that at least the LLVM/Clang generated BPF object seems to load
properly, and I can load and unload it as expected.


> 
> >> An alternative to getting LLVM built as part of the OpenWrt toolchain is
> >> to just use the host clang to build the BPF binaries. It doesn't
> >> actually need to be cross-compiled with a special compiler, the BPF byte
> >> code format is the same on all architectures except for endianness, so
> >> just passing that to the host clang should theoretically be enough...
> >
> > I believe that having a way to build BPF objects compatible with the
> > target built-into our toolchain would be a huge step forward.
> > And given that gcc already get's pretty far, I think it'd be worth
> > fixing/patching what ever is missing (I haven't even tried GCC-11 yet)
> 
> For this example that might work (as noted above), but for other things
> BTF is a hard requirement, and I don't believe GCC supports that at all,
> sadly :(

It looks like this has changed very recently:

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d5cf2b5db325fd5c053ca7bc8d6a54a06cd71124


> 
> > Find my staging tree including 'preserve-dscp' ready to play with:
> >
> > https://git.openwrt.org/?p=openwrt/staging/dangole.git;a=shortlog;h=refs/heads/gcc10-bpf
> >
> > Select 'Enable experimental features by default', but note that toolchain
> > doesn't build when selecting Linux 5.10 for x86, so you need to un-select
> > 'Use testing Kernel' if building for x86.
> > And have a look at the patch for allow building bpf-examples BPF objects
> > with GCC in package/network/utils/bpf-examples/patches
> >
> >
> >> 
> >> > In terms of kernel support: recent kernels don't build yet because of
> >> > gelf_getsymshndx, so we got to update libelf first for that. Recent
> >> > libelf doesn't seem to be an option yet on many of the build hosts we
> >> > currently support (Darwin and such).
> >> >
> >> > In terms of library support: our build of libbpf comes from Linux
> >> > release tarballs. There isn't yet a release supporting bpf_tc_attach,
> >> > the easiest would be to wait for Linux 5.13 to be released.
> >> 
> >> I used the libbpf TC loading support for convenience, but it's possible
> >> to load it using 'tc' as well without too much trouble (right now the
> >> userspace component sets a config variable before loading the program,
> >> but it can be restructured to not need that).
> >> 
> >> Alternatively, the bpf-examples repository is setup with a libbpf
> >> submodule that it can link statically against, so you could use that for
> >> now?
> >
> > I've updated to 5.13 + patches on top, so now it builds :)
> 
> Alright, that works.
> 
> > Library-embedding is a no-go for OpenWrt. Having different ABI-versions
> > of libraries installed simultanously works, so we can just ship with
> > a more recent version of libbpf.
> 
> Yeah, I wasn't suggesting it as a permanent solution, just so you could
> test it out :)

In the long run it would be great to have a somehow standardized and
reproducible way to build packages containing targetted BPF objects.
As having LLVM/Clang built-into OpenWrt has shown to be a huge amount
of work, I'm looking forward to GCC-12 which will support BTF from how
it looks by now...

> 
> >> > I (of course ;) also tried and spend almost a day looking for a
> >> > quick-and-dirty path for temporary deployment, so I could at least give
> >> > feedback -- bpf-examples also isn't exactly made to be cross-compiled
> >> > manually, so I have failed with that as well so far.
> >> 
> >> Heh, no, it isn't, really. Anything in particular you need to make this
> >> easier? We already added some bits to xdp-tools for supporting
> >> cross-compilation (and that shares some lineage with bpf-examples), so
> >> porting those over should not be too difficult.
> >
> > I found my way around, see the packaging for bpf-examples in the tree
> > (link above, at path stated above)
> 
> Right, I see. 

I have managed to test your solution and it seems to do the job.
Remaining issues:
 * What to do if there are many tunnels all sharing the same upstream
   interface? In this case I'm thinking of doing:
   preserve-dscp wg0 eth0
   preserve-dscp wg1 eth0
   preserve-dscp wg2 eth0
   ...
   But I'm unsure whether this is indented or if further details need
   to be implemented in order to make that work.

 * Once a wireguard interface goes down, one cannot unload the
   remaining program on the upstream interface, as
   preserve-dscp wg0 eth0 --unload
   would fail in case of 'wg0' having gone missing.
   What do you suggest to do in this case?


> 
> >> 
> >> See: https://github.com/xdp-project/xdp-tools/pull/78 and
> >> https://github.com/xdp-project/xdp-tools/issues/74
> >> 
> >> Unfortunately I don't have a lot of time to poke more at this right now,
> >> but feel free to open up an issue / pull request to the bpf-examples
> >> repository with any changes you need :)
> >
> > I guess I'll just go ahead then and package xdp-tools :)
> 
> That would be awesome! xdp-tools will definitely need BTF, though, so
> I'm afraid it'll need to be compiled with LLVM at this stage...

I'll probably move on doing other things for a while then and get back
to it once GCC-12 has been released...


Cheers


Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-07-04 14:15                         ` Daniel Golle
@ 2021-07-05 15:21                           ` Toke Høiland-Jørgensen
  2021-07-05 16:05                             ` Daniel Golle
  0 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-07-05 15:21 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Jason A. Donenfeld, Florent Daigniere, WireGuard mailing list

Daniel Golle <daniel@makrotopia.org> writes:

> Hi Toke,
>
> thank you for the ongoing efforts and support on this issue.

You're welcome! :)

> On Wed, Jun 30, 2021 at 10:55:09PM +0200, Toke Høiland-Jørgensen wrote:
>> Daniel Golle <daniel@makrotopia.org> writes:
>> > ...
>> >> >
>> >> > In terms of toolchain: LLVM/Clang is a very bulky beast, I gave up on
>> >> > that and started working on integrating GCC-10's BPF target in our build
>> >> > system...
>> >> 
>> >> I saw that, but I have no idea if GCC's BPF target support will support
>> >> this. My tentative guess would be no, unfortunately :(
>> >
>> > Probably you are right. When building the BPF object with GCC, the
>> > result is:
>> > root@OpenWrt:/usr/lib/bpf# preserve-dscp wg0 eth0
>> > libbpf: elf: skipping unrecognized data section(4) .stab
>> > libbpf: elf: skipping relo section(5) .rel.stab for section(4) .stab
>> > libbpf: elf: skipping unrecognized data section(13) .comment
>> > libbpf: BTF is required, but is missing or corrupted.
>> > Couldn't open file: preserve_dscp_kern.o
>> 
>> Hmm, for this example it should be possible to make it run without BTF.
>> I'm only using that for the map definition, so that could be changed to
>> the old format; you could try this patch:
>> 
>> diff --git a/preserve-dscp/preserve_dscp_kern.c b/preserve-dscp/preserve_dscp_kern.c
>> index 24120cb8a3ff..08248e1f0e41 100644
>> --- a/preserve-dscp/preserve_dscp_kern.c
>> +++ b/preserve-dscp/preserve_dscp_kern.c
>> @@ -9,12 +9,12 @@
>>   * otherwise clean up stale entries. Instead, we just rely on the LRU mechanism
>>   * to evict old entries as the map fills up.
>>   */
>> -struct {
>> -       __uint(type, BPF_MAP_TYPE_LRU_HASH);
>> -       __type(key, __u32);
>> -       __type(value, __u8);
>> -       __uint(max_entries, 16384);
>> -} flow_dscps SEC(".maps");
>> +struct bpf_map_def SEC("maps") flow_dscps = {
>> +       .type           = BPF_MAP_TYPE_LRU_HASH,
>> +       .key_size       = sizeof(__u32),
>> +       .value_size     = sizeof(__u8),
>> +       .max_entries    = 16384,
>> +};
>>  
>>  const volatile static int ip_only = 0;
>> 
>
> That change gives me the next error:
> libbpf: map '' (legacy): static maps are not supported
>
> Also speaks for itself...

Ah, right, the ip_only config also needs to be moved to an old-style
map, then...

>> > Using the LLVM/Clang compiled object also doesn't work:
>> > root@OpenWrt:/usr/lib/bpf# preserve-dscp wg0 eth0
>> > libbpf: Error in bpf_create_map_xattr(flow_dscps):Operation not permitted(-1). Retrying without BTF.
>> > libbpf: map 'flow_dscps': failed to create: Operation not permitted(-1)
>> > libbpf: permission error while running as root; try raising 'ulimit -l'? current value: 512.0 KiB
>> > libbpf: failed to load object 'preserve_dscp_kern.o'
>> > Failed to load object
>> >
>> > Probably Kernel 5.4.124 is too old...?
>> 
>> Here I think the hint is in the error message ;)
>
> Yep, I realized I had to increase it to ulimit to 2048...
> With that at least the LLVM/Clang generated BPF object seems to load
> properly, and I can load and unload it as expected.

Awesome!

>> >> An alternative to getting LLVM built as part of the OpenWrt toolchain is
>> >> to just use the host clang to build the BPF binaries. It doesn't
>> >> actually need to be cross-compiled with a special compiler, the BPF byte
>> >> code format is the same on all architectures except for endianness, so
>> >> just passing that to the host clang should theoretically be enough...
>> >
>> > I believe that having a way to build BPF objects compatible with the
>> > target built-into our toolchain would be a huge step forward.
>> > And given that gcc already get's pretty far, I think it'd be worth
>> > fixing/patching what ever is missing (I haven't even tried GCC-11 yet)
>> 
>> For this example that might work (as noted above), but for other things
>> BTF is a hard requirement, and I don't believe GCC supports that at all,
>> sadly :(
>
> It looks like this has changed very recently:
>
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d5cf2b5db325fd5c053ca7bc8d6a54a06cd71124

Uhh, exciting! Will be interesting to see how compatible this will be;
and how BPF upstream will deal with multiple compilers :)

>> > Find my staging tree including 'preserve-dscp' ready to play with:
>> >
>> > https://git.openwrt.org/?p=openwrt/staging/dangole.git;a=shortlog;h=refs/heads/gcc10-bpf
>> >
>> > Select 'Enable experimental features by default', but note that toolchain
>> > doesn't build when selecting Linux 5.10 for x86, so you need to un-select
>> > 'Use testing Kernel' if building for x86.
>> > And have a look at the patch for allow building bpf-examples BPF objects
>> > with GCC in package/network/utils/bpf-examples/patches
>> >
>> >
>> >> 
>> >> > In terms of kernel support: recent kernels don't build yet because of
>> >> > gelf_getsymshndx, so we got to update libelf first for that. Recent
>> >> > libelf doesn't seem to be an option yet on many of the build hosts we
>> >> > currently support (Darwin and such).
>> >> >
>> >> > In terms of library support: our build of libbpf comes from Linux
>> >> > release tarballs. There isn't yet a release supporting bpf_tc_attach,
>> >> > the easiest would be to wait for Linux 5.13 to be released.
>> >> 
>> >> I used the libbpf TC loading support for convenience, but it's possible
>> >> to load it using 'tc' as well without too much trouble (right now the
>> >> userspace component sets a config variable before loading the program,
>> >> but it can be restructured to not need that).
>> >> 
>> >> Alternatively, the bpf-examples repository is setup with a libbpf
>> >> submodule that it can link statically against, so you could use that for
>> >> now?
>> >
>> > I've updated to 5.13 + patches on top, so now it builds :)
>> 
>> Alright, that works.
>> 
>> > Library-embedding is a no-go for OpenWrt. Having different ABI-versions
>> > of libraries installed simultanously works, so we can just ship with
>> > a more recent version of libbpf.
>> 
>> Yeah, I wasn't suggesting it as a permanent solution, just so you could
>> test it out :)
>
> In the long run it would be great to have a somehow standardized and
> reproducible way to build packages containing targetted BPF objects.
> As having LLVM/Clang built-into OpenWrt has shown to be a huge amount
> of work, I'm looking forward to GCC-12 which will support BTF from how
> it looks by now...
>
>> 
>> >> > I (of course ;) also tried and spend almost a day looking for a
>> >> > quick-and-dirty path for temporary deployment, so I could at least give
>> >> > feedback -- bpf-examples also isn't exactly made to be cross-compiled
>> >> > manually, so I have failed with that as well so far.
>> >> 
>> >> Heh, no, it isn't, really. Anything in particular you need to make this
>> >> easier? We already added some bits to xdp-tools for supporting
>> >> cross-compilation (and that shares some lineage with bpf-examples), so
>> >> porting those over should not be too difficult.
>> >
>> > I found my way around, see the packaging for bpf-examples in the tree
>> > (link above, at path stated above)
>> 
>> Right, I see. 
>
> I have managed to test your solution and it seems to do the job.
> Remaining issues:
>  * What to do if there are many tunnels all sharing the same upstream
>    interface? In this case I'm thinking of doing:
>    preserve-dscp wg0 eth0
>    preserve-dscp wg1 eth0
>    preserve-dscp wg2 eth0
>    ...
>    But I'm unsure whether this is indented or if further details need
>    to be implemented in order to make that work.

Hmm, not sure whether that will work out of the box, actually. Would
definitely be doable to make the userspace utility understand how to do
this properly, though. There's nothing in principle preventing this from
working; the loader should just be smart enough to do incremental
loading of multiple "ingress" programs while still sharing the map
between all of them.

The only potential operational issue with using it on multiple wg
interfaces is if they share IP space; because in that case you might
have packets from different tunnels ending up with identical hashes,
confusing the egress side. Fixing this would require the outer BPF
program to know about wg endpoint addresses and map the packets back to
their inner ifindexes using that. But as long as the wireguard tunnels
are using different IP subnets (or mostly forwarding traffic without the
inner addresses as sources or destinations), the hash collision
probability should not be bigger than just traffic on a single tunnel, I
suppose.

One particular thing to watch out for here is IPv6 link-local traffic;
sine wg doesn't generate link-local addresses automatically, they are
commonly configured with (the same) static address (like fe80::1 or
fe80::2), which would make link-local traffic identical across wg
interfaces. But this is only used for particular setups (I use it for
running Babel over wg, for instance), just make sure it won't be an
issue for your deployment scenario :)

>  * Once a wireguard interface goes down, one cannot unload the
>    remaining program on the upstream interface, as
>    preserve-dscp wg0 eth0 --unload
>    would fail in case of 'wg0' having gone missing.
>    What do you suggest to do in this case?

Just fixing the userspace utility to deal with this case properly as
well is probably the easiest. How are you thinking you'd deploy this?
Via ifup hooks on openwrt, or something different?

>> >> See: https://github.com/xdp-project/xdp-tools/pull/78 and
>> >> https://github.com/xdp-project/xdp-tools/issues/74
>> >> 
>> >> Unfortunately I don't have a lot of time to poke more at this right now,
>> >> but feel free to open up an issue / pull request to the bpf-examples
>> >> repository with any changes you need :)
>> >
>> > I guess I'll just go ahead then and package xdp-tools :)
>> 
>> That would be awesome! xdp-tools will definitely need BTF, though, so
>> I'm afraid it'll need to be compiled with LLVM at this stage...
>
> I'll probably move on doing other things for a while then and get back
> to it once GCC-12 has been released...

OK, SGTM.

-Toke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-07-05 15:21                           ` Toke Høiland-Jørgensen
@ 2021-07-05 16:05                             ` Daniel Golle
  2021-07-05 16:59                               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Golle @ 2021-07-05 16:05 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jason A. Donenfeld, Florent Daigniere, WireGuard mailing list

On Mon, Jul 05, 2021 at 05:21:25PM +0200, Toke Høiland-Jørgensen wrote:
> Daniel Golle <daniel@makrotopia.org> writes:
> ...
> > I have managed to test your solution and it seems to do the job.
> > Remaining issues:
> >  * What to do if there are many tunnels all sharing the same upstream
> >    interface? In this case I'm thinking of doing:
> >    preserve-dscp wg0 eth0
> >    preserve-dscp wg1 eth0
> >    preserve-dscp wg2 eth0
> >    ...
> >    But I'm unsure whether this is indented or if further details need
> >    to be implemented in order to make that work.
> 
> Hmm, not sure whether that will work out of the box, actually. Would
> definitely be doable to make the userspace utility understand how to do
> this properly, though. There's nothing in principle preventing this from
> working; the loader should just be smart enough to do incremental
> loading of multiple "ingress" programs while still sharing the map
> between all of them.

You make it at least sound easy :)

> 
> The only potential operational issue with using it on multiple wg
> interfaces is if they share IP space; because in that case you might
> have packets from different tunnels ending up with identical hashes,
> confusing the egress side. Fixing this would require the outer BPF
> program to know about wg endpoint addresses and map the packets back to
> their inner ifindexes using that. But as long as the wireguard tunnels
> are using different IP subnets (or mostly forwarding traffic without the
> inner addresses as sources or destinations), the hash collision
> probability should not be bigger than just traffic on a single tunnel, I
> suppose.
> 
> One particular thing to watch out for here is IPv6 link-local traffic;
> sine wg doesn't generate link-local addresses automatically, they are
> commonly configured with (the same) static address (like fe80::1 or
> fe80::2), which would make link-local traffic identical across wg
> interfaces. But this is only used for particular setups (I use it for
> running Babel over wg, for instance), just make sure it won't be an
> issue for your deployment scenario :)

All this is good to know, but from what I can see now shouldn't be
a problem in our deployment -- it's multiple wireguard links which are
(using fwmark and ip rules) routed over several uplinks. We then use
mwan3 to balance most of the gateway traffic accross the available
wireguard interfaces, using MASQ/SNAT on each tunnel which has a
unique transfer network assigned, and no IPv6 at all.
Hence it should be ok to go under the restrictions you described.

> >  * Once a wireguard interface goes down, one cannot unload the
> >    remaining program on the upstream interface, as
> >    preserve-dscp wg0 eth0 --unload
> >    would fail in case of 'wg0' having gone missing.
> >    What do you suggest to do in this case?
> 
> Just fixing the userspace utility to deal with this case properly as
> well is probably the easiest. How are you thinking you'd deploy this?
> Via ifup hooks on openwrt, or something different?

Yes, I use ifup hooks configured in an init script for procd and have
it tied to the wireguard config sections in /etc/config/network:

https://git.openwrt.org/?p=openwrt/staging/dangole.git;a=blob;f=package/network/utils/bpf-examples/files/wireguard-preserve-dscp.init;h=f1e5e25e663308e057285e2bd8e3bcb9560bdd54;hb=5923a78d74be3f05e734b0be0a832a87be8d369b#l56

Passing multiple inner interfaces to one call to the to-be-modified
preserve-dscp tool could be achieved by some shell magic dealing with
the configuration...

We will have to restart the filter for all inner interfaces in case of
one being added or removed, right?

And maybe I'll come up with some state tracking so orphaned filters can
be removed after configuration changes...



Cheers


Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-07-05 16:05                             ` Daniel Golle
@ 2021-07-05 16:59                               ` Toke Høiland-Jørgensen
  2021-07-05 17:26                                 ` Daniel Golle
  0 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-07-05 16:59 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Jason A. Donenfeld, Florent Daigniere, WireGuard mailing list

Daniel Golle <daniel@makrotopia.org> writes:

> On Mon, Jul 05, 2021 at 05:21:25PM +0200, Toke Høiland-Jørgensen wrote:
>> Daniel Golle <daniel@makrotopia.org> writes:
>> ...
>> > I have managed to test your solution and it seems to do the job.
>> > Remaining issues:
>> >  * What to do if there are many tunnels all sharing the same upstream
>> >    interface? In this case I'm thinking of doing:
>> >    preserve-dscp wg0 eth0
>> >    preserve-dscp wg1 eth0
>> >    preserve-dscp wg2 eth0
>> >    ...
>> >    But I'm unsure whether this is indented or if further details need
>> >    to be implemented in order to make that work.
>> 
>> Hmm, not sure whether that will work out of the box, actually. Would
>> definitely be doable to make the userspace utility understand how to do
>> this properly, though. There's nothing in principle preventing this from
>> working; the loader should just be smart enough to do incremental
>> loading of multiple "ingress" programs while still sharing the map
>> between all of them.
>
> You make it at least sound easy :)

I'd say the implementation is relatively straight-forward for anyone
familiar with how BPF works; figuring out how it's *supposed* to work is
the hard bit ;)

>> The only potential operational issue with using it on multiple wg
>> interfaces is if they share IP space; because in that case you might
>> have packets from different tunnels ending up with identical hashes,
>> confusing the egress side. Fixing this would require the outer BPF
>> program to know about wg endpoint addresses and map the packets back to
>> their inner ifindexes using that. But as long as the wireguard tunnels
>> are using different IP subnets (or mostly forwarding traffic without the
>> inner addresses as sources or destinations), the hash collision
>> probability should not be bigger than just traffic on a single tunnel, I
>> suppose.
>> 
>> One particular thing to watch out for here is IPv6 link-local traffic;
>> sine wg doesn't generate link-local addresses automatically, they are
>> commonly configured with (the same) static address (like fe80::1 or
>> fe80::2), which would make link-local traffic identical across wg
>> interfaces. But this is only used for particular setups (I use it for
>> running Babel over wg, for instance), just make sure it won't be an
>> issue for your deployment scenario :)
>
> All this is good to know, but from what I can see now shouldn't be
> a problem in our deployment -- it's multiple wireguard links which are
> (using fwmark and ip rules) routed over several uplinks. We then use
> mwan3 to balance most of the gateway traffic accross the available
> wireguard interfaces, using MASQ/SNAT on each tunnel which has a
> unique transfer network assigned, and no IPv6 at all.
> Hence it should be ok to go under the restrictions you described.

Alright, so the wireguard-to-physical interfaces is always many-to-one?
I.e., each wireguard interface is always routed out the same physical
interface, but there may be multiple wg interfaces sharing the same
uplink?

I'm asking because in that case it does make sense to keep separate
instances of the whole setup per physical interface to limit hash
collisions; otherwise, the lookup table could also be made global and
shared between all physical interfaces, so you'd avoid having to specify
the relationship explicitly...

>> >  * Once a wireguard interface goes down, one cannot unload the
>> >    remaining program on the upstream interface, as
>> >    preserve-dscp wg0 eth0 --unload
>> >    would fail in case of 'wg0' having gone missing.
>> >    What do you suggest to do in this case?
>> 
>> Just fixing the userspace utility to deal with this case properly as
>> well is probably the easiest. How are you thinking you'd deploy this?
>> Via ifup hooks on openwrt, or something different?
>
> Yes, I use ifup hooks configured in an init script for procd and have
> it tied to the wireguard config sections in /etc/config/network:
>
> https://git.openwrt.org/?p=openwrt/staging/dangole.git;a=blob;f=package/network/utils/bpf-examples/files/wireguard-preserve-dscp.init;h=f1e5e25e663308e057285e2bd8e3bcb9560bdd54;hb=5923a78d74be3f05e734b0be0a832a87be8d369b#l56
>
> Passing multiple inner interfaces to one call to the to-be-modified
> preserve-dscp tool could be achieved by some shell magic dealing with
> the configuration...

Not necessary: it's perfectly fine to attach them one at a time.

> We will have to restart the filter for all inner interfaces in case of
> one being added or removed, right?

Nope, that's no necessary either. We can just re-attach the same filter
program to each additional interface.

> And maybe I'll come up with some state tracking so orphaned filters can
> be removed after configuration changes...

The userspace loader could be made to detect this and automatically
clean up the program on the physical interface after the last internal
interface goes away. At least as long as we can rely on an ifdown hook
this will be fairly straight-forward (just requires a lock to not be
racy). Detecting it after interfaces are automatically removed from the
kernel is a bit more cumbersome as it would require some way to trigger
the garbage collection.

-Toke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-07-05 16:59                               ` Toke Høiland-Jørgensen
@ 2021-07-05 17:26                                 ` Daniel Golle
  2021-07-05 21:20                                   ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Golle @ 2021-07-05 17:26 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jason A. Donenfeld, Florent Daigniere, WireGuard mailing list

Hi Toke,

On Mon, Jul 05, 2021 at 06:59:10PM +0200, Toke Høiland-Jørgensen wrote:
> Daniel Golle <daniel@makrotopia.org> writes:
> > ...
> >> The only potential operational issue with using it on multiple wg
> >> interfaces is if they share IP space; because in that case you might
> >> have packets from different tunnels ending up with identical hashes,
> >> confusing the egress side. Fixing this would require the outer BPF
> >> program to know about wg endpoint addresses and map the packets back to
> >> their inner ifindexes using that. But as long as the wireguard tunnels
> >> are using different IP subnets (or mostly forwarding traffic without the
> >> inner addresses as sources or destinations), the hash collision
> >> probability should not be bigger than just traffic on a single tunnel, I
> >> suppose.
> >> 
> >> One particular thing to watch out for here is IPv6 link-local traffic;
> >> sine wg doesn't generate link-local addresses automatically, they are
> >> commonly configured with (the same) static address (like fe80::1 or
> >> fe80::2), which would make link-local traffic identical across wg
> >> interfaces. But this is only used for particular setups (I use it for
> >> running Babel over wg, for instance), just make sure it won't be an
> >> issue for your deployment scenario :)
> >
> > All this is good to know, but from what I can see now shouldn't be
> > a problem in our deployment -- it's multiple wireguard links which are
> > (using fwmark and ip rules) routed over several uplinks. We then use
> > mwan3 to balance most of the gateway traffic accross the available
> > wireguard interfaces, using MASQ/SNAT on each tunnel which has a
> > unique transfer network assigned, and no IPv6 at all.
> > Hence it should be ok to go under the restrictions you described.
> 
> Alright, so the wireguard-to-physical interfaces is always many-to-one?
> I.e., each wireguard interface is always routed out the same physical
> interface, but there may be multiple wg interfaces sharing the same
> uplink?

Well, on the access concentrator in the datacentre this is the case:
All wireguard tunnels are using the same interface.

On the remote system there are *many* tunnels to the same access
concentrator, each routed over a different uplink interface.
So there it's a 1:1 mapping, each wgX has it's distinct ethX.
(and there the current solution already works fine)

> 
> I'm asking because in that case it does make sense to keep separate
> instances of the whole setup per physical interface to limit hash
> collisions; otherwise, the lookup table could also be made global and
> shared between all physical interfaces, so you'd avoid having to specify
> the relationship explicitly...
> 
> >> >  * Once a wireguard interface goes down, one cannot unload the
> >> >    remaining program on the upstream interface, as
> >> >    preserve-dscp wg0 eth0 --unload
> >> >    would fail in case of 'wg0' having gone missing.
> >> >    What do you suggest to do in this case?
> >> 
> >> Just fixing the userspace utility to deal with this case properly as
> >> well is probably the easiest. How are you thinking you'd deploy this?
> >> Via ifup hooks on openwrt, or something different?
> >
> > Yes, I use ifup hooks configured in an init script for procd and have
> > it tied to the wireguard config sections in /etc/config/network:
> >
> > https://git.openwrt.org/?p=openwrt/staging/dangole.git;a=blob;f=package/network/utils/bpf-examples/files/wireguard-preserve-dscp.init;h=f1e5e25e663308e057285e2bd8e3bcb9560bdd54;hb=5923a78d74be3f05e734b0be0a832a87be8d369b#l56
> >
> > Passing multiple inner interfaces to one call to the to-be-modified
> > preserve-dscp tool could be achieved by some shell magic dealing with
> > the configuration...
> 
> Not necessary: it's perfectly fine to attach them one at a time.

So assume we changed the userspace tool to accept multiple inner
interfaces, let's say we called:
preserve-dscp wg0 wg1 wg2 eth0

And then, at some later point in time we want to add 'wg3'.
So calling
preserve-dscp wg0 wg1 wg2 wg3 eth0
could just work without interrupting ongoing service on wg0..wg2?
That would definitely require the userspace tool to track some local
state and store it in /var/lib/foo/...? Or am I getting something
wrong here?

> 
> > We will have to restart the filter for all inner interfaces in case of
> > one being added or removed, right?
> 
> Nope, that's no necessary either. We can just re-attach the same filter
> program to each additional interface.

So given the above example, I could then just call
preserve-dscp wg3 eth0
to add eth3 while wg0..wg2 keep working?

> 
> > And maybe I'll come up with some state tracking so orphaned filters can
> > be removed after configuration changes...
> 
> The userspace loader could be made to detect this and automatically
> clean up the program on the physical interface after the last internal
> interface goes away. At least as long as we can rely on an ifdown hook
> this will be fairly straight-forward (just requires a lock to not be
> racy). Detecting it after interfaces are automatically removed from the
> kernel is a bit more cumbersome as it would require some way to trigger
> the garbage collection.

I'll look into that and how we are intending to handle that in
general in OpenWrt. John was working on that I believe, I'll ask him
first.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-07-05 17:26                                 ` Daniel Golle
@ 2021-07-05 21:20                                   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-07-05 21:20 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Jason A. Donenfeld, Florent Daigniere, WireGuard mailing list

Daniel Golle <daniel@makrotopia.org> writes:

> Hi Toke,
>
> On Mon, Jul 05, 2021 at 06:59:10PM +0200, Toke Høiland-Jørgensen wrote:
>> Daniel Golle <daniel@makrotopia.org> writes:
>> > ...
>> >> The only potential operational issue with using it on multiple wg
>> >> interfaces is if they share IP space; because in that case you might
>> >> have packets from different tunnels ending up with identical hashes,
>> >> confusing the egress side. Fixing this would require the outer BPF
>> >> program to know about wg endpoint addresses and map the packets back to
>> >> their inner ifindexes using that. But as long as the wireguard tunnels
>> >> are using different IP subnets (or mostly forwarding traffic without the
>> >> inner addresses as sources or destinations), the hash collision
>> >> probability should not be bigger than just traffic on a single tunnel, I
>> >> suppose.
>> >> 
>> >> One particular thing to watch out for here is IPv6 link-local traffic;
>> >> sine wg doesn't generate link-local addresses automatically, they are
>> >> commonly configured with (the same) static address (like fe80::1 or
>> >> fe80::2), which would make link-local traffic identical across wg
>> >> interfaces. But this is only used for particular setups (I use it for
>> >> running Babel over wg, for instance), just make sure it won't be an
>> >> issue for your deployment scenario :)
>> >
>> > All this is good to know, but from what I can see now shouldn't be
>> > a problem in our deployment -- it's multiple wireguard links which are
>> > (using fwmark and ip rules) routed over several uplinks. We then use
>> > mwan3 to balance most of the gateway traffic accross the available
>> > wireguard interfaces, using MASQ/SNAT on each tunnel which has a
>> > unique transfer network assigned, and no IPv6 at all.
>> > Hence it should be ok to go under the restrictions you described.
>> 
>> Alright, so the wireguard-to-physical interfaces is always many-to-one?
>> I.e., each wireguard interface is always routed out the same physical
>> interface, but there may be multiple wg interfaces sharing the same
>> uplink?
>
> Well, on the access concentrator in the datacentre this is the case:
> All wireguard tunnels are using the same interface.
>
> On the remote system there are *many* tunnels to the same access
> concentrator, each routed over a different uplink interface.
> So there it's a 1:1 mapping, each wgX has it's distinct ethX.
> (and there the current solution already works fine)

Alright. The important thing here is that no single wg interface splits
its traffic over multiple uplinks, so that's all fine :)

>> I'm asking because in that case it does make sense to keep separate
>> instances of the whole setup per physical interface to limit hash
>> collisions; otherwise, the lookup table could also be made global and
>> shared between all physical interfaces, so you'd avoid having to specify
>> the relationship explicitly...
>> 
>> >> >  * Once a wireguard interface goes down, one cannot unload the
>> >> >    remaining program on the upstream interface, as
>> >> >    preserve-dscp wg0 eth0 --unload
>> >> >    would fail in case of 'wg0' having gone missing.
>> >> >    What do you suggest to do in this case?
>> >> 
>> >> Just fixing the userspace utility to deal with this case properly as
>> >> well is probably the easiest. How are you thinking you'd deploy this?
>> >> Via ifup hooks on openwrt, or something different?
>> >
>> > Yes, I use ifup hooks configured in an init script for procd and have
>> > it tied to the wireguard config sections in /etc/config/network:
>> >
>> > https://git.openwrt.org/?p=openwrt/staging/dangole.git;a=blob;f=package/network/utils/bpf-examples/files/wireguard-preserve-dscp.init;h=f1e5e25e663308e057285e2bd8e3bcb9560bdd54;hb=5923a78d74be3f05e734b0be0a832a87be8d369b#l56
>> >
>> > Passing multiple inner interfaces to one call to the to-be-modified
>> > preserve-dscp tool could be achieved by some shell magic dealing with
>> > the configuration...
>> 
>> Not necessary: it's perfectly fine to attach them one at a time.
>
> So assume we changed the userspace tool to accept multiple inner
> interfaces, let's say we called:
> preserve-dscp wg0 wg1 wg2 eth0
>
> And then, at some later point in time we want to add 'wg3'.
> So calling
> preserve-dscp wg0 wg1 wg2 wg3 eth0
> could just work without interrupting ongoing service on wg0..wg2?
> That would definitely require the userspace tool to track some local
> state and store it in /var/lib/foo/...? Or am I getting something
> wrong here?

Not necessarily; we can extract the information we need from the kernel:

The whole thing works by having one BPF program that extracts the DSCP
value on the wg interface and puts it into a map, keyed by the packet
hash; and then the other program on the physical interface reads that
same map and looks up packet hashes in it. So what this means it that
the critical thing is that the two BPF programs share the same map, or
the information won't be there.

In other words, when loading the DSCP-parsing program on a second wg
interface, the essential bit is that it ends up using the same map as
the already-loaded program on the physical interface. Either by loading
a second copy of the BPF program and re-using the existing map, or by
just attaching the same program to multiple interfaces.

As for how we find the map (or program) to attach to the second
interface, we can either "pin" the reference in a special bpffs and load
it from there, or we can use the kernel introspection APIs to discover
which map ID is currently used by the program on the physical interface,
and then open that map by ID (if just reusing the map), or iterate
through existing BPF programs looking for one which uses this map, then
attaching that to a second interface. Using bpffs is slightly more
convenient (doesn't require iterating programs), but has the drawback
that it adds a dependency on the bpffs itself, and that it needs an
explicit removal step (unlinking the pinned reference).

No matter which mechanism is used, the loading of the parsing program on
a second wg interface is completely transparent to any program already
running: it'll just be another source of data going into the map that
the physical interface program can then read. Think of it as multiple
clients writing to the same database: as long as they agree on which
database they're using, clients can come and go without interfering with
each other.

>> > We will have to restart the filter for all inner interfaces in case of
>> > one being added or removed, right?
>> 
>> Nope, that's no necessary either. We can just re-attach the same filter
>> program to each additional interface.
>
> So given the above example, I could then just call
> preserve-dscp wg3 eth0
> to add eth3 while wg0..wg2 keep working?

Yup, see above.

>> > And maybe I'll come up with some state tracking so orphaned filters can
>> > be removed after configuration changes...
>> 
>> The userspace loader could be made to detect this and automatically
>> clean up the program on the physical interface after the last internal
>> interface goes away. At least as long as we can rely on an ifdown hook
>> this will be fairly straight-forward (just requires a lock to not be
>> racy). Detecting it after interfaces are automatically removed from the
>> kernel is a bit more cumbersome as it would require some way to trigger
>> the garbage collection.
>
> I'll look into that and how we are intending to handle that in
> general in OpenWrt. John was working on that I believe, I'll ask him
> first.

OK, cool.

-Toke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-06-16 16:28 ` Jason A. Donenfeld
  2021-06-16 19:26   ` Daniel Golle
@ 2021-07-06  7:00   ` Florent Daigniere
  2021-07-06 20:08     ` Luiz Angelo Daros de Luca
  1 sibling, 1 reply; 22+ messages in thread
From: Florent Daigniere @ 2021-07-06  7:00 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

On Wed, 2021-06-16 at 18:28 +0200, Jason A. Donenfeld wrote:
> WireGuard does not copy the inner DSCP mark to the outside, aside from
> the ECN bits, in order to avoid a data leak.
> 
> Jason

Hi Jason,

Is there any room for revisiting this design decision? We are talking
about 6 bits of metadata per packet here...

Which realistic threats are we trying to protect against?

The solutions that don't involve code changes all have significant
drawbacks:
- awesome BPF-based magic will be Linux only
- multiple tunnels are not always practical and arguably worse traffic
correlation-wise.

I still use a patched wireguard to protect traffic from a voip app on an
android handset using wifi here... and while I have a solution that's
good enough for my requirements, I do think that the community would
benefit from having something that works better out of the box (and on
all platforms).

Florent

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: passing-through TOS/DSCP marking
  2021-07-06  7:00   ` Florent Daigniere
@ 2021-07-06 20:08     ` Luiz Angelo Daros de Luca
  0 siblings, 0 replies; 22+ messages in thread
From: Luiz Angelo Daros de Luca @ 2021-07-06 20:08 UTC (permalink / raw)
  To: WireGuard mailing list

I see talented people spending resources trying to go around a
wireguard design decision. I understand that wireguard try to keep it
as safe as possible and as simple as possible. However, passing some
traffic information to the encrypted packet is a requirement for some
setups. Wouldn't it be better to have it provided by wireguard but
disabled by default? If the change don't kill a kitty, it will not
harm the security as users are already doing the same but through a
much harder way.

My 2 cents.

---
     Luiz Angelo Daros de Luca
            luizluca@gmail.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-07-06 20:08 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-16 13:24 passing-through TOS/DSCP marking Daniel Golle
2021-06-16 16:28 ` Jason A. Donenfeld
2021-06-16 19:26   ` Daniel Golle
2021-06-16 23:33     ` Toke Høiland-Jørgensen
2021-06-17  7:55       ` Florent Daigniere
2021-06-17  9:41         ` Daniel Golle
2021-06-17 12:24           ` Toke Høiland-Jørgensen
     [not found]             ` <CAMaqUZ09KRtp01OK3u-Di52X_kH9eT4E-wmnPc6QzjSCd5dEiw@mail.gmail.com>
2021-06-17 20:54               ` Toke Høiland-Jørgensen
2021-06-17 23:04             ` Toke Høiland-Jørgensen
2021-06-18 12:24               ` Jason A. Donenfeld
2021-06-21 12:36                 ` Daniel Golle
2021-06-21 14:27                   ` Toke Høiland-Jørgensen
2021-06-30 17:23                     ` Daniel Golle
2021-06-30 20:55                       ` Toke Høiland-Jørgensen
2021-07-04 14:15                         ` Daniel Golle
2021-07-05 15:21                           ` Toke Høiland-Jørgensen
2021-07-05 16:05                             ` Daniel Golle
2021-07-05 16:59                               ` Toke Høiland-Jørgensen
2021-07-05 17:26                                 ` Daniel Golle
2021-07-05 21:20                                   ` Toke Høiland-Jørgensen
2021-07-06  7:00   ` Florent Daigniere
2021-07-06 20:08     ` Luiz Angelo Daros de Luca

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).