Development discussion of WireGuard
 help / color / mirror / Atom feed
From: "Björn Töpel" <bjorn@kernel.org>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: wireguard@lists.zx2c4.com, Dmitry Vyukov <dvyukov@google.com>
Subject: Re: [PATCH RFC v1] wireguard: queueing: get rid of per-peer ring buffers
Date: Thu, 18 Feb 2021 14:49:52 +0100	[thread overview]
Message-ID: <CAJ+HfNjNBUg9rvFtiuvNDP3KKmjGg50O+23c6KJvtGfJ2Qf+bA@mail.gmail.com> (raw)
In-Reply-To: <20210208133816.45333-1-Jason@zx2c4.com>

On Mon, 8 Feb 2021 at 14:47, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Having two ring buffers per-peer means that every peer results in two
> massive ring allocations. On an 8-core x86_64 machine, this commit
> reduces the per-peer allocation from 18,688 bytes to 1,856 bytes, which
> is an 90% reduction. Ninety percent! With some single-machine
> deployments approaching 400,000 peers, we're talking about a reduction
> from 7 gigs of memory down to 700 megs of memory.
>
> In order to get rid of these per-peer allocations, this commit switches
> to using a list-based queueing approach. Currently GSO fragments are
> chained together using the skb->next pointer, so we form the per-peer
> queue around the unused skb->prev pointer, which makes sense because the
> links are pointing backwards. Multiple cores can write into the queue at
> any given time, because its writes occur in the start_xmit path or in
> the udp_recv path. But reads happen in a single workqueue item per-peer,
> amounting to a multi-producer, single-consumer paradigm.
>
> The MPSC queue is implemented locklessly and never blocks. However, it
> is not linearizable (though it is serializable), with a very tight and
> unlikely race on writes, which, when hit (about 0.15% of the time on a
> fully loaded 16-core x86_64 system), causes the queue reader to
> terminate early. However, because every packet sent queues up the same
> workqueue item after it is fully added, the queue resumes again, and
> stopping early isn't actually a problem, since at that point the packet
> wouldn't have yet been added to the encryption queue. These properties
> allow us to avoid disabling interrupts or spinning.
>
> Performance-wise, ordinarily list-based queues aren't preferable to
> ringbuffers, because of cache misses when following pointers around.
> However, we *already* have to follow the adjacent pointers when working
> through fragments, so there shouldn't actually be any change there. A
> potential downside is that dequeueing is a bit more complicated, but the
> ptr_ring structure used prior had a spinlock when dequeueing, so all and
> all the difference appears to be a wash.
>
> Actually, from profiling, the biggest performance hit, by far, of this
> commit winds up being atomic_add_unless(count, 1, max) and atomic_
> dec(count), which account for the majority of CPU time, according to
> perf. In that sense, the previous ring buffer was superior in that it
> could check if it was full by head==tail, which the list-based approach
> cannot do.
>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> Hoping to get some feedback here from people running massive deployments
> and running into ram issues, as well as Dmitry on the queueing semantics
> (the mpsc queue is his design), before I send this to Dave for merging.
> These changes are quite invasive, so I don't want to get anything wrong.
>

[...]

> diff --git a/drivers/net/wireguard/queueing.c b/drivers/net/wireguard/queueing.c
> index 71b8e80b58e1..a72380ce97dd 100644
> --- a/drivers/net/wireguard/queueing.c
> +++ b/drivers/net/wireguard/queueing.c

[...]

> +
> +static void __wg_prev_queue_enqueue(struct prev_queue *queue, struct sk_buff *skb)
> +{
> +       WRITE_ONCE(NEXT(skb), NULL);
> +       smp_wmb();
> +       WRITE_ONCE(NEXT(xchg_relaxed(&queue->head, skb)), skb);
> +}
> +

I'll chime in with Toke; This MPSC and Dmitry's links really took me
to the "verify with pen/paper"-level! Thanks!

I'd replace the smp_wmb()/_relaxed above with a xchg_release(), which
might perform better on some platforms. Also, it'll be a nicer pair
with the ldacq below. :-P In general, it would be nice with some
wording how the fences pair. It would help the readers (like me!) a
lot.


Cheers,
Björn

[...]

  parent reply	other threads:[~2021-02-18 13:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08 13:38 Jason A. Donenfeld
2021-02-09  8:24 ` Dmitry Vyukov
2021-02-09 15:44   ` Jason A. Donenfeld
2021-02-09 16:20     ` Dmitry Vyukov
2021-02-17 18:36 ` Toke Høiland-Jørgensen
2021-02-17 22:28   ` Jason A. Donenfeld
2021-02-17 23:41     ` Toke Høiland-Jørgensen
2021-02-18 13:49 ` Björn Töpel [this message]
2021-02-18 13:53   ` Jason A. Donenfeld
2021-02-18 14:04     ` Björn Töpel
2021-02-18 14:15       ` Jason A. Donenfeld
2021-02-18 15:12         ` Björn Töpel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ+HfNjNBUg9rvFtiuvNDP3KKmjGg50O+23c6KJvtGfJ2Qf+bA@mail.gmail.com \
    --to=bjorn@kernel.org \
    --cc=Jason@zx2c4.com \
    --cc=dvyukov@google.com \
    --cc=wireguard@lists.zx2c4.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).