Allowing space for packet headers in Wintun Tx/Rx

Development discussion of WireGuard
 help / color / mirror / Atom feed

* Allowing space for packet headers in Wintun Tx/Rx
@ 2021-04-07 11:49 David Woodhouse
  2021-04-07 23:15 ` Daniel Lenski
  0 siblings, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2021-04-07 11:49 UTC (permalink / raw)
  To: WireGuard mailing list; +Cc: Daniel Lenski

[-- Attachment #1: Type: text/plain, Size: 1646 bytes --]

My initial port of OpenConnect to use Wintun has a fairly simple
implementation for sending and receiving packets.

The transmit function is basically this:

   WintunAllocateSendPacket()
   memcpy()
   WintunSendPacket()

I don't like the memcpy very much. I'd like to receive and decrypt
incoming VPN packets from the public network directly into the Wintun
ring.

But packets have *headers*. Cisco AnyConnect, for example, has a single
byte of 'type' field at the beginning of its DTLS packets. We decrypt
the packet we receive from the public network, and *if* the first byte
indicates that it's a data packet, we deliver the packet data starting
from the second byte onwards into the tun device.

So even if I were to receive and decrypt into a buffer allocated with
WintunAllocateSendPacket(), I couldn't *send* that without a memmove()
to move it all down by one byte.

If WintunSendPacket took an additional 'offset' argument to disregard a
certain number of bytes at the beginning of the buffer, that would
probably suffice. Or is it possible to simply add to the pointer
returned by WintunAllocateSendPacket()?

Receiving is slightly more complex, as it's too late by the time
WintunReceivePacket() is called; the data have already been received
into the top of the available space. Perhaps the header reservation for
Rx could be an additional parameter to WintunStartSession(), and the
requested amount of space would always be available before the pointer
returned from WintunReceivePacket()? (It's not clear how to *use* that
knowledge in valid and safe C though.)

Are there better options?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Allowing space for packet headers in Wintun Tx/Rx
  2021-04-07 11:49 Allowing space for packet headers in Wintun Tx/Rx David Woodhouse
@ 2021-04-07 23:15 ` Daniel Lenski
  2021-04-08 14:37   ` David Woodhouse
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Lenski @ 2021-04-07 23:15 UTC (permalink / raw)
  To: David Woodhouse; +Cc: WireGuard mailing list

On Wed, Apr 7, 2021 at 4:49 AM David Woodhouse <dwmw2@infradead.org> wrote:
> If WintunSendPacket took an additional 'offset' argument to disregard a
> certain number of bytes at the beginning of the buffer, that would
> probably suffice. Or is it possible to simply add to the pointer
> returned by WintunAllocateSendPacket()?

To expand on this possibility a little bit, I had proposed kludging a
“shift” into the allocation for the outgoing packet:

    /* Always use this instead for an outgoing packet, instead of
     * malloc(sizeof(struct pkt) + payload_len */
    BYTE *tun_pkt = WintunAllocateSendPacket(
        vpninfo->wintun_session,
        paylod_len                 /* packet payload size */
        + sizeof(struct pkt)     /* OpenConnect's internal packet header size */
    );

    /* Then after we build and populate the outgoing packet, just tell
     * Wintun to send from an offset that's NOT at the beginning of the
     * buffer it allocated for us. No memcpy! */
    WintunSendPacket(vpninfo->wintun_session, tun_pkt + sizeof(struct pkt));

The concern here is that Wintun may not have been written with this
possibility in mind, and might not always like sending from an address
other than the exact start of a buffer it's allocated for us.

Thanks,
Dan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Allowing space for packet headers in Wintun Tx/Rx
  2021-04-07 23:15 ` Daniel Lenski
@ 2021-04-08 14:37   ` David Woodhouse
  2021-04-08 16:42     ` Daniel Lenski
  0 siblings, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2021-04-08 14:37 UTC (permalink / raw)
  To: Daniel Lenski; +Cc: WireGuard mailing list

[-- Attachment #1: Type: text/plain, Size: 6752 bytes --]

On Wed, 2021-04-07 at 16:15 -0700, Daniel Lenski wrote:
> On Wed, Apr 7, 2021 at 4:49 AM David Woodhouse <dwmw2@infradead.org> wrote:
> > If WintunSendPacket took an additional 'offset' argument to disregard a
> > certain number of bytes at the beginning of the buffer, that would
> > probably suffice. Or is it possible to simply add to the pointer
> > returned by WintunAllocateSendPacket()?
> 
> To expand on this possibility a little bit, I had proposed kludging a
> “shift” into the allocation for the outgoing packet:
> 
>     /* Always use this instead for an outgoing packet, instead of
>      * malloc(sizeof(struct pkt) + payload_len */
>     BYTE *tun_pkt = WintunAllocateSendPacket(
>         vpninfo->wintun_session,
>         paylod_len                 /* packet payload size */
>         + sizeof(struct pkt)     /* OpenConnect's internal packet header size */
>     );
> 
>     /* Then after we build and populate the outgoing packet, just tell
>      * Wintun to send from an offset that's NOT at the beginning of the
>      * buffer it allocated for us. No memcpy! */
>     WintunSendPacket(vpninfo->wintun_session, tun_pkt + sizeof(struct pkt));
> 
> The concern here is that Wintun may not have been written with this
> possibility in mind, and might not always like sending from an address
> other than the exact start of a buffer it's allocated for us.

If we have to, we could probably ditch a lot of what's in the DLL's
api/session.c and manage the ring for ourselves; the fundamental
limitation right now is how the kernel driver itself handles the shared
ring.

It uses a TUN_PACKET header which is basically just a 'ULONG Size'
followed immediately by the data. To allow the application to process
packets out-of-order, the *userspace* part sets the top bit of ->Size
to manage ownership between the DLL and the application, but the kernel
side doesn't see that because it's removed by the time the DLL bumps
the ring tail to allow the kernel to see the packet.

For OpenConnect I think it's reasonable to declare that we don't care
about optimising the TCP fallback data path; it's the UDP data path
that we really want to be zero-copy, decrypting directly into the tun
ring and encrypting directly from it.

Let's take a look at the protocols we need to support...

 ========== 
 Cisco DTLS
 ==========

Cisco DTLS has only a single byte of header, for the packet type.

So for VPN→tun traffic we could actually just receive the packet (with
gnutls_record_recv() or ssl_read()) directly into the TUN ring, with
the first byte *overwriting* the MSB of the Size field of the
TUN_HEADER.

If it's a packet type of anything other than AC_PKT_DATA we'd then
handle it immediately (as we do now) and end up re-using the same
location in the ring next time. If it's AC_PKT_DATA we just reinstate
the Size field of the header and bump the ring Tail to let the kernel
see it. If it wasn't for the fact that we need to allocate the full
possible MTU and then reduce the actual packet size once we see it,
this would even be compatible with the existing ring management in
wintun.dll. It's certainly compatible with the kernel driver.

For tun→VPN traffic we can do something similar, abusing that top byte
of the Size field. And this *would* be compatible with the existing
DLL, since we know that byte is going to be zero (which is AC_PKT_DATA)
anyway. (Here's where it would be nice to have gnutls_record_sendv() or
SSL_writev() methods which do scatter/gather, but we don't).

The bare minimum we need from Wintun here is a function that would
resize the last allocated send packet. As long as nothing has been
allocated *after* it, we could reduce its size and adjust
Session->Receive.Tail accordingly.

 ===
 ESP
 ===

For ESP we do our own encrypt/decrypt operations directly and don't
need a header. But we do actually need a *tail*, since the packet is
padded to the crypto block size and has a 'next header' field in the
final encrypted byte, e.g:

   DD DD DD DD DD DD DD DD  DD DD DD DD DD DD DD DD
   DD DD DD 01 02 03 04 05  06 07 08 09 0a 0b 0b 29
            <........... padding ..........>  ↑  ↑
                                         padlen  next_hdr

So, much as with Cisco DTLS for VPN→tun traffic we'd want to allocate
*more* space in the Wintun ring than we actually need, then shrink the
Size field later to match precisely the size of the packet. Which again
is compatible with the kernel driver but not with the ring management
currently in api/session.c.

For tun→VPN traffic it's a little more complex as we ideally want the
kernel driver to leave space *after* the packet, for us to append the
tail bytes without scribbling over the subsequent packet in the ring.

But it isn't the end of the world; we can encrypt *most* of the packet
directly from the tun ring, and we only need to copy *less* than a
single crypto block (those first three DD DD DD bytes in the final line
of the above example) into a buffer elsewhere to handle the padding
etc.

 =============
 PPP over DTLS
 =============

We just added support for the PPP-based protocols (Fortinet, F5) and
I'm not sure we even know what the DTLS-based version looks like on the
wire, do we? If the header is 4 bytes or fewer, the same nasty trick
works that I suggest for Cisco DTLS above. And a PPP header even with
accomp and pfcomp *would* fit in 4 bytes. For the TCP transports we
have an additional framing but I'm hoping those aren't there in DTLS?

If we do need a header larger than 4 bytes, then we are forced to do
things properly by adding support in the kernel driver instead of just
abusing the existing header while we know the kernel isn't looking at
it.

So, what do we want, and what's the bare minimum we actually *need*
from Wintun to be able to avoid those memcpys? 

The bare minimum is either exposing enough of the TUN_SESSION to let us
manage the rings for ourselves, or a function which can resize the
*last* allocated packet from the Tx ring before we call
WintunSendPacket() on it. That's purely userspace in wintun.dll.

The next request would be to expand the TUN_HEADER to include head/tail
space, and a parameter in the TUN_REGISTER_RINGS structure which
configures the amount of head/tail space to leave between received
packets. That's a change in the kernel API and is more complex to
manage, and as noted we *could* live without it for now although it's
kind of ugly, still involves *some* copying at the tail of outbound ESP
packets, and depends on those PPP headers not exceeding the 4 bytes
that are currently available for us to abuse :)

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Allowing space for packet headers in Wintun Tx/Rx
  2021-04-08 14:37   ` David Woodhouse
@ 2021-04-08 16:42     ` Daniel Lenski
  2021-04-08 17:10       ` David Woodhouse
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Lenski @ 2021-04-08 16:42 UTC (permalink / raw)
  To: David Woodhouse; +Cc: WireGuard mailing list

On Thu, Apr 8, 2021 at 7:37 AM David Woodhouse <dwmw2@infradead.org> wrote:
>  =============
>  PPP over DTLS
>  =============
>
> We just added support for the PPP-based protocols (Fortinet, F5) and
> I'm not sure we even know what the DTLS-based version looks like on the
> wire, do we? If the header is 4 bytes or fewer, the same nasty trick
> works that I suggest for Cisco DTLS above. And a PPP header even with
> accomp and pfcomp *would* fit in 4 bytes. For the TCP transports we
> have an additional framing but I'm hoping those aren't there in DTLS?
>
> If we do need a header larger than 4 bytes, then we are forced to do
> things properly by adding support in the kernel driver instead of just
> abusing the existing header while we know the kernel isn't looking at
> it.

This is probably too much "inside baseball" for the non-(OpenConnect
developers) here, but I *have* confirmed that the PPP-over-DTLS
encapsulation is identical to the PPP-over-TLS encapsulation for the 2
PPP-based protocols that we support already. Both F5 and Fortinet
essentially opted for the thinnest veneer of UDP-ization possible for
their protocols.

> So, what do we want, and what's the bare minimum we actually *need*
> from Wintun to be able to avoid those memcpys?
>
> The bare minimum is either exposing enough of the TUN_SESSION to let us
> manage the rings for ourselves, or a function which can resize the
> *last* allocated packet from the Tx ring before we call
> WintunSendPacket() on it. That's purely userspace in wintun.dll.
>
> The next request would be to expand the TUN_HEADER to include head/tail
> space, and a parameter in the TUN_REGISTER_RINGS structure which
> configures the amount of head/tail space to leave between received
> packets. That's a change in the kernel API and is more complex to
> manage, and as noted we *could* live without it for now although it's
> kind of ugly, still involves *some* copying at the tail of outbound ESP
> packets, and depends on those PPP headers not exceeding the 4 bytes
> that are currently available for us to abuse :)

The tl;dr for OpenConnect is that we really would need support for
arbitrary head/tail space in order not to have to do *any* memcpy.

Dan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Allowing space for packet headers in Wintun Tx/Rx
  2021-04-08 16:42     ` Daniel Lenski
@ 2021-04-08 17:10       ` David Woodhouse
  2021-04-08 17:37         ` Daniel Lenski
  2021-04-10 13:38         ` Simon Rozman
  0 siblings, 2 replies; 13+ messages in thread
From: David Woodhouse @ 2021-04-08 17:10 UTC (permalink / raw)
  To: Daniel Lenski; +Cc: WireGuard mailing list

[-- Attachment #1: Type: text/plain, Size: 2084 bytes --]

On Thu, 2021-04-08 at 09:42 -0700, Daniel Lenski wrote:
> On Thu, Apr 8, 2021 at 7:37 AM David Woodhouse <dwmw2@infradead.org> wrote:
> >  =============
> >  PPP over DTLS
> >  =============
> > 
> > We just added support for the PPP-based protocols (Fortinet, F5) and
> > I'm not sure we even know what the DTLS-based version looks like on the
> > wire, do we? If the header is 4 bytes or fewer, the same nasty trick
> > works that I suggest for Cisco DTLS above. And a PPP header even with
> > accomp and pfcomp *would* fit in 4 bytes. For the TCP transports we
> > have an additional framing but I'm hoping those aren't there in DTLS?
> > 
> > If we do need a header larger than 4 bytes, then we are forced to do
> > things properly by adding support in the kernel driver instead of just
> > abusing the existing header while we know the kernel isn't looking at
> > it.
> 
> This is probably too much "inside baseball" for the non-(OpenConnect
> developers) here, but I *have* confirmed that the PPP-over-DTLS
> encapsulation is identical to the PPP-over-TLS encapsulation for the 2
> PPP-based protocols that we support already. Both F5 and Fortinet
> essentially opted for the thinnest veneer of UDP-ization possible for
> their protocols.

Ok, so that's the PPP header plus either 6 bytes for Fortinet or 4
bytes for F5? The important part for the purpose of this conversation
is "more than four".

> The tl;dr for OpenConnect is that we really would need support for
> arbitrary head/tail space in order not to have to do *any* memcpy.

Right. I'm almost relieved at that, since it's certainly a lot *nicer*
to do it that way than any of the hacks I just described.

We can extend the TUN_PACKET header structure to have Headroom and
Tailroom fields, then add a new ioctl which enables the new TUN_PACKET
format and sets the headroom/tailroom to be used in the Rx ring.
Implementing that should be relatively simple and unintrusive; the
trickiest part of it is maintaining backward compatibility with the
existing TUN_PACKET structure.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Allowing space for packet headers in Wintun Tx/Rx
  2021-04-08 17:10       ` David Woodhouse
@ 2021-04-08 17:37         ` Daniel Lenski
  2021-04-10 13:38         ` Simon Rozman
  1 sibling, 0 replies; 13+ messages in thread
From: Daniel Lenski @ 2021-04-08 17:37 UTC (permalink / raw)
  To: David Woodhouse; +Cc: WireGuard mailing list

On Thu, Apr 8, 2021 at 10:10 AM David Woodhouse <dwmw2@infradead.org> wrote:
> On Thu, 2021-04-08 at 09:42 -0700, Daniel Lenski wrote:
> > On Thu, Apr 8, 2021 at 7:37 AM David Woodhouse <dwmw2@infradead.org> wrote:
> > > If we do need a header larger than 4 bytes, then we are forced to do
> > > things properly by adding support in the kernel driver instead of just
> > > abusing the existing header while we know the kernel isn't looking at
> > > it.
> >
> > This is probably too much "inside baseball" for the non-(OpenConnect
> > developers) here, but I *have* confirmed that the PPP-over-DTLS
> > encapsulation is identical to the PPP-over-TLS encapsulation for the 2
> > PPP-based protocols that we support already. Both F5 and Fortinet
> > essentially opted for the thinnest veneer of UDP-ization possible for
> > their protocols.
>
> Ok, so that's the PPP header plus either 6 bytes for Fortinet or 4
> bytes for F5? The important part for the purpose of this conversation
> is "more than four".

Correct. We need >4 bytes to support PPP-over-DTLS headers without copying.

And we will undoubtedly find more examples in the ongoing quest to
make OpenConnect serve as The One Client For Your Crappy Proprietary
Corporate VPN to Rule Them All.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Allowing space for packet headers in Wintun Tx/Rx
  2021-04-08 17:10       ` David Woodhouse
  2021-04-08 17:37         ` Daniel Lenski
@ 2021-04-10 13:38         ` Simon Rozman
  2021-04-10 14:35           ` David Woodhouse
  1 sibling, 1 reply; 13+ messages in thread
From: Simon Rozman @ 2021-04-10 13:38 UTC (permalink / raw)
  To: David Woodhouse, Daniel Lenski; +Cc: WireGuard mailing list

[-- Attachment #1: Type: text/plain, Size: 165 bytes --]

Hi David,This is my proposal:https://git.zx2c4.com/wintun/commit/?id=eebd6aea4f75551f6e847a1d4fff857450bac6e9Awaiting review and zx2c4 approval. 😊Regards, Simon

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5559 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Allowing space for packet headers in Wintun Tx/Rx
  2021-04-10 13:38         ` Simon Rozman
@ 2021-04-10 14:35           ` David Woodhouse
  2021-04-10 18:32             ` Daniel Lenski
  0 siblings, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2021-04-10 14:35 UTC (permalink / raw)
  To: Simon Rozman, Daniel Lenski; +Cc: WireGuard mailing list

[-- Attachment #1: Type: text/plain, Size: 981 bytes --]

On Sat, 2021-04-10 at 13:38 +0000, Simon Rozman wrote:
> Hi David,This is my proposal:
> https://git.zx2c4.com/wintun/commit/?id=eebd6aea4f75551f6e847a1d4fff857450bac6e9
> Awaiting review and zx2c4 approval. 😊
> Regards, Simon

Looks good to me; thanks. Just need to work out how to cross-build it
(I can muster up a Windows VM for testing, but *building* on it is
beyond my tolerance of Windows for now).

We'll also need to be able to WintunAllocateSendPacket() of the full
possible MTU, then receive and decrypt into that, and send only the
actual size of the packet we received.

A per-packet tail would have let us do that, but I agree that we don't
want to expand the TUN_PACKET header if we can avoid doing so.

Perhaps a WintunShrinkAndSendPacket() — which can only *shrink*, of
course, and which can only be used on the *last* packet allocated,
checking that its tail *is* the Session->Receive.Tail before adjusting
the latter accordingly.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Allowing space for packet headers in Wintun Tx/Rx
  2021-04-10 14:35           ` David Woodhouse
@ 2021-04-10 18:32             ` Daniel Lenski
  2021-04-12 11:38               ` Simon Rozman
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Lenski @ 2021-04-10 18:32 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Simon Rozman, WireGuard mailing list

On Sat, Apr 10, 2021 at 7:35 AM David Woodhouse <dwmw2@infradead.org> wrote:
> On Sat, 2021-04-10 at 13:38 +0000, Simon Rozman wrote:
> > Hi David,This is my proposal:
> > https://git.zx2c4.com/wintun/commit/?id=eebd6aea4f75551f6e847a1d4fff857450bac6e9
> > Awaiting review and zx2c4 approval.
> > Regards, Simon
>
>
> Looks good to me; thanks. Just need to work out how to cross-build it
> (I can muster up a Windows VM for testing, but *building* on it is
> beyond my tolerance of Windows for now).

+1 to all that.

> We'll also need to be able to WintunAllocateSendPacket() of the full
> possible MTU, then receive and decrypt into that, and send only the
> actual size of the packet we received.
>
> A per-packet tail would have let us do that, but I agree that we don't
> want to expand the TUN_PACKET header if we can avoid doing so.
>
> Perhaps a WintunShrinkAndSendPacket() — which can only *shrink*, of
> course, and which can only be used on the *last* packet allocated,
> checking that its tail *is* the Session->Receive.Tail before adjusting
> the latter accordingly.

In addition to the use case for chopping ESP trailers and
less-than-full-size packets, OpenConnect has the case of "PPP packets
in HDLC-like framing" which need to be "un-HDLC-ed" in a way that can
only cause them to shrink.
(https://gitlab.com/openconnect/openconnect/blob/master/ppp.c#L102-158)

There are two cases worth considering where the packet size could
actually *expand*:

1) Some VPN protocols support compression of the tunneled packets. It
would be bad behavior to use this to stuff a packet of >(advertised
MTU) bytes in <(advertised MTU) bytes, but it wouldn't surprise me if
it exists in the wild. We now deal with receipt of
larger-than-expected-MTU packets in OpenConnect in a relatively
uniform way: allocate MAX(mtu, 16384) bytes for packets coming from
the VPN (if using TLS transport) or MAX(mtu, 2048) if using DTLS.
2) Some VPN protocols concatenate multiple packets into a single
aggregate on the wire. On Linux we can decrypt, truncate, and send to
the tunnel interface without further copying.

Case (1) can be handled with overallocate-and-shrink. Case (2) is
pretty rare among the protocols that OpenConnect supports, so fallback
to memcpy seems fine.

Dan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Allowing space for packet headers in Wintun Tx/Rx
  2021-04-10 18:32             ` Daniel Lenski
@ 2021-04-12 11:38               ` Simon Rozman
  2021-04-12 13:00                 ` David Woodhouse
  0 siblings, 1 reply; 13+ messages in thread
From: Simon Rozman @ 2021-04-12 11:38 UTC (permalink / raw)
  To: Daniel Lenski, David Woodhouse; +Cc: WireGuard mailing list

Hi,

> > Looks good to me; thanks. Just need to work out how to cross-build it
> > (I can muster up a Windows VM for testing, but *building* on it is
> > beyond my tolerance of Windows for now).
> 
> +1 to all that.

Don't worry. Once Jason is back, reviews and (hopefully) approves the changes, we shall prepare an official release for you.

> > We'll also need to be able to WintunAllocateSendPacket() of the full
> > possible MTU, then receive and decrypt into that, and send only the
> > actual size of the packet we received.

How about this: https://git.zx2c4.com/wintun/commit/?id=03b6cd410c8963d1888966edf31fdc35a4c8b523

Should be backward compatible. Tested with the existing stable wireguard-windows release 0.3.10.

> There are two cases worth considering where the packet size could
> actually *expand*:
> 
> 1) Some VPN protocols support compression of the tunneled packets. It
> would be bad behavior to use this to stuff a packet of >(advertised
> MTU) bytes in <(advertised MTU) bytes, but it wouldn't surprise me if it
> exists in the wild. We now deal with receipt of larger-than-expected-MTU
> packets in OpenConnect in a relatively uniform way: allocate MAX(mtu,
> 16384) bytes for packets coming from the VPN (if using TLS transport) or
> MAX(mtu, 2048) if using DTLS.
> 2) Some VPN protocols concatenate multiple packets into a single
> aggregate on the wire. On Linux we can decrypt, truncate, and send to
> the tunnel interface without further copying.
> 
> Case (1) can be handled with overallocate-and-shrink. Case (2) is pretty
> rare among the protocols that OpenConnect supports, so fallback to
> memcpy seems fine.

Phew! Thanks. :)

Regards,
Simon

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Allowing space for packet headers in Wintun Tx/Rx
  2021-04-12 11:38               ` Simon Rozman
@ 2021-04-12 13:00                 ` David Woodhouse
  2021-04-12 17:03                   ` Jason A. Donenfeld
  0 siblings, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2021-04-12 13:00 UTC (permalink / raw)
  To: Simon Rozman, Daniel Lenski; +Cc: WireGuard mailing list



On 12 April 2021 12:38:30 BST, Simon Rozman <simon@rozman.si> wrote:
>> > We'll also need to be able to WintunAllocateSendPacket() of the
>full
>> > possible MTU, then receive and decrypt into that, and send only the
>> > actual size of the packet we received.
>
>How about this:
>https://git.zx2c4.com/wintun/commit/?id=03b6cd410c8963d1888966edf31fdc35a4c8b523
>
>Should be backward compatible. Tested with the existing stable
>wireguard-windows release 0.3.10.

Looks good; thanks. However...

>> 2) Some VPN protocols concatenate multiple packets into a single
>> aggregate on the wire. On Linux we can decrypt, truncate, and send to
>> the tunnel interface without further copying.
>> 
>> Case (1) can be handled with overallocate-and-shrink. Case (2) is
>pretty
>> rare among the protocols that OpenConnect supports, so fallback to
>> memcpy seems fine.

Case (2) is fairly easy to handle when the L3 packet size doesn't use up the whole TUN_PACKET size, isn't it? We can loop and consume multiple smaller packets?

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Allowing space for packet headers in Wintun Tx/Rx
  2021-04-12 13:00                 ` David Woodhouse
@ 2021-04-12 17:03                   ` Jason A. Donenfeld
  2021-04-13 22:09                     ` Jason A. Donenfeld
  0 siblings, 1 reply; 13+ messages in thread
From: Jason A. Donenfeld @ 2021-04-12 17:03 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Simon Rozman, Daniel Lenski, WireGuard mailing list

Hey guys,

Sorry I'm a bit late to this thread. I'm happy to see there's a
prototype for benchmarking, though I do wonder if this is a bit of
overeager optimization? That is, why is this necessary and does it
actually help?

By returning packets back to the Wintun ring later, more of the ring
winds up being used, which in turn means more cache misses as it spans
additional cache lines. In other words, it seems like this might be
comparing the performance of memcpy+cache no-memcpy+cachemiss. Which
is better, and is it actually measurable? Is it possible that adding
this functionality actually has zero measurable impact on performance?
Given the complexity this adds, it'd be nice to see some numbers to
help make the argument, or perhaps reasoning that's more sophisticated
than my own napkin thoughts here.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Allowing space for packet headers in Wintun Tx/Rx
  2021-04-12 17:03                   ` Jason A. Donenfeld
@ 2021-04-13 22:09                     ` Jason A. Donenfeld
  0 siblings, 0 replies; 13+ messages in thread
From: Jason A. Donenfeld @ 2021-04-13 22:09 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Simon Rozman, Daniel Lenski, WireGuard mailing list

On Mon, Apr 12, 2021 at 11:03 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Sorry I'm a bit late to this thread. I'm happy to see there's a
> prototype for benchmarking, though I do wonder if this is a bit of
> overeager optimization? That is, why is this necessary and does it
> actually help?
>
> By returning packets back to the Wintun ring later, more of the ring
> winds up being used, which in turn means more cache misses as it spans
> additional cache lines. In other words, it seems like this might be
> comparing the performance of memcpy+cache no-memcpy+cachemiss. Which
> is better, and is it actually measurable? Is it possible that adding
> this functionality actually has zero measurable impact on performance?
> Given the complexity this adds, it'd be nice to see some numbers to
> help make the argument, or perhaps reasoning that's more sophisticated
> than my own napkin thoughts here.

I've moved these improvements to this branch while we wait for
additional argumentation:
https://git.zx2c4.com/wintun/log/?h=sr/api-improvements

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-04-13 22:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-07 11:49 Allowing space for packet headers in Wintun Tx/Rx David Woodhouse
2021-04-07 23:15 ` Daniel Lenski
2021-04-08 14:37   ` David Woodhouse
2021-04-08 16:42     ` Daniel Lenski
2021-04-08 17:10       ` David Woodhouse
2021-04-08 17:37         ` Daniel Lenski
2021-04-10 13:38         ` Simon Rozman
2021-04-10 14:35           ` David Woodhouse
2021-04-10 18:32             ` Daniel Lenski
2021-04-12 11:38               ` Simon Rozman
2021-04-12 13:00                 ` David Woodhouse
2021-04-12 17:03                   ` Jason A. Donenfeld
2021-04-13 22:09                     ` Jason A. Donenfeld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).