Development discussion of WireGuard
 help / color / mirror / Atom feed
* Memleak with 0.0.20171221-5 on Debian stretch
@ 2018-02-11 13:48 Baptiste Jonglez
  2018-02-11 18:20 ` Daniel Kahn Gillmor
  0 siblings, 1 reply; 11+ messages in thread
From: Baptiste Jonglez @ 2018-02-11 13:48 UTC (permalink / raw)
  To: wireguard


[-- Attachment #1.1: Type: text/plain, Size: 956 bytes --]

Hi,

On a x86_64 VM with quite a lot of Wireguard traffic (~300 GB per day), I
am seeing a memory leak with wireguard 0.0.20171221-5.  System is Debian
stretch, kernel 4.9.65-3+deb9u2, wireguard package from unstable.

I have attached the memory usage reported by Munin over one month.  The
memleak is quite evident, and seems to happen in "slab_cache".  I realized
the issue today because the machine was completely unusable ("fork: cannot
allocate memory" with any command in a shell).

The machine was rebooted just at the point marked "Week 03" (for the
kernel upgrade for Meltdown, but wireguard was also upgraded), and it also
marks the beginning of the memleak.

To sum things up:

- no memleak: kernel 4.9.51-1, wireguard 0.0.20171011-1
- memleak: kernel 4.9.65-3+deb9u2, wireguard 0.0.20171221-5
- currently testing: kernel 4.9.65-3+deb9u2, wireguard 0.0.20180202-1

I will let you know if the memleak is still here with 0.0.20180202-1.

Baptiste

[-- Attachment #1.2: memory-month-zopfli.png --]
[-- Type: image/png, Size: 27099 bytes --]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Memleak with 0.0.20171221-5 on Debian stretch
  2018-02-11 13:48 Memleak with 0.0.20171221-5 on Debian stretch Baptiste Jonglez
@ 2018-02-11 18:20 ` Daniel Kahn Gillmor
  2018-02-11 18:43   ` Baptiste Jonglez
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Kahn Gillmor @ 2018-02-11 18:20 UTC (permalink / raw)
  To: Baptiste Jonglez, wireguard

Hi Baptiste--

On Sun 2018-02-11 14:48:37 +0100, Baptiste Jonglez wrote:

> On a x86_64 VM with quite a lot of Wireguard traffic (~300 GB per day), I
> am seeing a memory leak with wireguard 0.0.20171221-5.  System is Debian
> stretch, kernel 4.9.65-3+deb9u2, wireguard package from unstable.

oof, thanks for this report, and for the really useful graph
visualization.

it's troubling that the changes correlated with the memleak are both a
kernel upgrade *and* a wireguard upgrade, since that kind of conflation
might be difficult to tease apart.

i'm curious from the graph -- do you know what happened at the start of
week 6 where there's a sawtooth?

If you still see a leak with the latest wireguard, i'd appreciate if you
could test the current kernel with 0.0.20171011-1 to see whether you can
isolate the problem to the kernel.  i'm not recommending running
0.0.20171011-1 for the long term, but it should still be wire-format
compatible with other implementations and will help with debugging to
have the comparison.

regards,

        --dkg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Memleak with 0.0.20171221-5 on Debian stretch
  2018-02-11 18:20 ` Daniel Kahn Gillmor
@ 2018-02-11 18:43   ` Baptiste Jonglez
  2018-02-12  0:23     ` Jason A. Donenfeld
  2018-02-12  3:34     ` Daniel Kahn Gillmor
  0 siblings, 2 replies; 11+ messages in thread
From: Baptiste Jonglez @ 2018-02-11 18:43 UTC (permalink / raw)
  To: Daniel Kahn Gillmor; +Cc: wireguard

[-- Attachment #1: Type: text/plain, Size: 1788 bytes --]

On 11-02-18, Daniel Kahn Gillmor wrote:
> Hi Baptiste--
> 
> On Sun 2018-02-11 14:48:37 +0100, Baptiste Jonglez wrote:
> 
> > On a x86_64 VM with quite a lot of Wireguard traffic (~300 GB per day), I
> > am seeing a memory leak with wireguard 0.0.20171221-5.  System is Debian
> > stretch, kernel 4.9.65-3+deb9u2, wireguard package from unstable.
> 
> oof, thanks for this report, and for the really useful graph
> visualization.
> 
> it's troubling that the changes correlated with the memleak are both a
> kernel upgrade *and* a wireguard upgrade, since that kind of conflation
> might be difficult to tease apart.

Yes, I *think* it's related to wireguard and not the kernel upgrade (since
far more people use the kernel than wireguard), but I'm not 100% sure.

And indeed, we could imagine it to be an issue in wireguard related to the
newer kernel...

> i'm curious from the graph -- do you know what happened at the start of
> week 6 where there's a sawtooth?

Actually, the amount of "slab_cache" didn't change at that point, it's
just the amount of application memory that dropped a bit.  I looked at the
logs, some userspace processes were being killed by the OOM-killer.

> If you still see a leak with the latest wireguard, i'd appreciate if you
> could test the current kernel with 0.0.20171011-1 to see whether you can
> isolate the problem to the kernel.  i'm not recommending running
> 0.0.20171011-1 for the long term, but it should still be wire-format
> compatible with other implementations and will help with debugging to
> have the comparison.

Excellent suggestion!

It does look like 0.0.20180202-1 still has the memleak.  I will leave it
running a few more days to be certain, and then switch to 0.0.20171011-1.

Baptiste

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Memleak with 0.0.20171221-5 on Debian stretch
  2018-02-11 18:43   ` Baptiste Jonglez
@ 2018-02-12  0:23     ` Jason A. Donenfeld
  2018-02-12  7:35       ` Baptiste Jonglez
  2018-02-12  3:34     ` Daniel Kahn Gillmor
  1 sibling, 1 reply; 11+ messages in thread
From: Jason A. Donenfeld @ 2018-02-12  0:23 UTC (permalink / raw)
  To: Baptiste Jonglez; +Cc: WireGuard mailing list

Hey Baptiste,

Thanks for the detailed report. Graphs like that are quite helpful.
I'm just back from a long weekend, so sorry for not having a chance to
look at this sooner.

I'm first curious about the basic "control group" issue Daniel
mentioned -- it's probably important to isolate if it's the new kernel
or the new module, or some complex interaction of the two.

Secondly, I'm wondering if you tend to do, "anything strange". For
example -- are you setting up and taking down the device often in an
automated way? Or reconfiguring the interface (via wg(8), for example)
often in an automated way? Or is the sustained day-in-day-out workload
that leads to this graph simply forwarding and encrypting/decrypting
packets as usual? If it's the latter, does this device tend to encrypt
or decrypt more, or both equally? In either case, I'm not sure too
much has changed between those version spans you gave, with regards to
the general packet encryption/decryption path, so I suspect this bug
will take some hunting to track down.

Thanks again for the report. Let me know about the kernel version situation.

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Memleak with 0.0.20171221-5 on Debian stretch
  2018-02-11 18:43   ` Baptiste Jonglez
  2018-02-12  0:23     ` Jason A. Donenfeld
@ 2018-02-12  3:34     ` Daniel Kahn Gillmor
  1 sibling, 0 replies; 11+ messages in thread
From: Daniel Kahn Gillmor @ 2018-02-12  3:34 UTC (permalink / raw)
  To: Baptiste Jonglez; +Cc: wireguard

[-- Attachment #1: Type: text/plain, Size: 676 bytes --]

On Sun 2018-02-11 19:43:12 +0100, Baptiste Jonglez wrote:
> On 11-02-18, Daniel Kahn Gillmor wrote:
>
>> i'm curious from the graph -- do you know what happened at the start of
>> week 6 where there's a sawtooth?
>
> Actually, the amount of "slab_cache" didn't change at that point, it's
> just the amount of application memory that dropped a bit.  I looked at the
> logs, some userspace processes were being killed by the OOM-killer.

ah right, this makes sense, i'd forgotten that these munin
RAM-consumption graphs are "stacked".  Thanks for tracking down the
oom-killer event anyway.

hopefully we can sort out the rest soon, thanks for raising it here.

           --dkg

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Memleak with 0.0.20171221-5 on Debian stretch
  2018-02-12  0:23     ` Jason A. Donenfeld
@ 2018-02-12  7:35       ` Baptiste Jonglez
  2018-02-12  7:42         ` Baptiste Jonglez
  0 siblings, 1 reply; 11+ messages in thread
From: Baptiste Jonglez @ 2018-02-12  7:35 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

[-- Attachment #1: Type: text/plain, Size: 1053 bytes --]

Hi Jason,

On 12-02-18, Jason A. Donenfeld wrote:
> Secondly, I'm wondering if you tend to do, "anything strange". For
> example -- are you setting up and taking down the device often in an
> automated way? Or reconfiguring the interface (via wg(8), for example)
> often in an automated way? Or is the sustained day-in-day-out workload
> that leads to this graph simply forwarding and encrypting/decrypting
> packets as usual? If it's the latter, does this device tend to encrypt
> or decrypt more, or both equally?

It's the latter "day-in-day-out" option: the system has a single wireguard
interface, which is configured once at boot-time, and then used
extensively to forward traffic.  It tends to encrypt more than it
decrypts.

> In either case, I'm not sure too much has changed between those version
> spans you gave, with regards to the general packet encryption/decryption
> path, so I suspect this bug will take some hunting to track down.
> 
> Thanks again for the report. Let me know about the kernel version situation.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Memleak with 0.0.20171221-5 on Debian stretch
  2018-02-12  7:35       ` Baptiste Jonglez
@ 2018-02-12  7:42         ` Baptiste Jonglez
  2018-02-12 11:04           ` Jason A. Donenfeld
  0 siblings, 1 reply; 11+ messages in thread
From: Baptiste Jonglez @ 2018-02-12  7:42 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

[-- Attachment #1: Type: text/plain, Size: 1725 bytes --]

On 12-02-18, Baptiste Jonglez wrote:
> Hi Jason,
> 
> On 12-02-18, Jason A. Donenfeld wrote:
> > Secondly, I'm wondering if you tend to do, "anything strange". For
> > example -- are you setting up and taking down the device often in an
> > automated way? Or reconfiguring the interface (via wg(8), for example)
> > often in an automated way? Or is the sustained day-in-day-out workload
> > that leads to this graph simply forwarding and encrypting/decrypting
> > packets as usual? If it's the latter, does this device tend to encrypt
> > or decrypt more, or both equally?
> 
> It's the latter "day-in-day-out" option: the system has a single wireguard
> interface, which is configured once at boot-time, and then used
> extensively to forward traffic.  It tends to encrypt more than it
> decrypts.

Actually, now that I talk about it, it's not 100% true: on this system,
there is a second wireguard interface that is not currently used (it's
provisionned to connect a future router that is not yet deployed).

The interesting part: this interface has a single peer which has no
endpoint but a persistent keepalive.  It looks like this:

    interface: wg-router2
      public key: XXXXXXXXXXXXXXXXXX
      private key: (hidden)
      listening port: 56008

    peer: YYYYYYYYYYY
      allowed ips: 0.0.0.0/0, ::/0
      persistent keepalive: every 25 seconds

Maybe wireguard allocates something to send the persistent keepalive, then
bails out because we don't know the endpoint of the peer?

I have taken this second interface down, but it has not released any
memory.  I am now leaving it up without the persistent keepalive, just in
case something interesting happens.

Baptiste

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Memleak with 0.0.20171221-5 on Debian stretch
  2018-02-12  7:42         ` Baptiste Jonglez
@ 2018-02-12 11:04           ` Jason A. Donenfeld
  2018-02-13 13:17             ` Baptiste Jonglez
  0 siblings, 1 reply; 11+ messages in thread
From: Jason A. Donenfeld @ 2018-02-12 11:04 UTC (permalink / raw)
  To: Baptiste Jonglez; +Cc: WireGuard mailing list

Hey Baptiste,

On Mon, Feb 12, 2018 at 8:42 AM, Baptiste Jonglez
<baptiste@bitsofnetworks.org> wrote:
> Actually, now that I talk about it, it's not 100% true: on this system,
> there is a second wireguard interface that is not currently used (it's
> provisionned to connect a future router that is not yet deployed).
>
> The interesting part: this interface has a single peer which has no
> endpoint but a persistent keepalive.

That's a super useful observation! I'm guessing this will fix it:
https://git.zx2c4.com/WireGuard/commit/?id=c5c22fb9bad1807a612b6055e0049d68f4600605

I'm still analyzing everything to find other places where I might have
missed something, but hopefully the above does it.

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Memleak with 0.0.20171221-5 on Debian stretch
  2018-02-12 11:04           ` Jason A. Donenfeld
@ 2018-02-13 13:17             ` Baptiste Jonglez
  2018-02-18 20:39               ` Jason A. Donenfeld
  0 siblings, 1 reply; 11+ messages in thread
From: Baptiste Jonglez @ 2018-02-13 13:17 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

[-- Attachment #1: Type: text/plain, Size: 1163 bytes --]

On 12-02-18, Jason A. Donenfeld wrote:
> Hey Baptiste,
> 
> On Mon, Feb 12, 2018 at 8:42 AM, Baptiste Jonglez
> <baptiste@bitsofnetworks.org> wrote:
> > Actually, now that I talk about it, it's not 100% true: on this system,
> > there is a second wireguard interface that is not currently used (it's
> > provisionned to connect a future router that is not yet deployed).
> >
> > The interesting part: this interface has a single peer which has no
> > endpoint but a persistent keepalive.

It seems to be a valid hypothesis: after I disabled persistent keepalives
on this interface (delete interface, remove persistent keepalive from
configuration, create interface again), memory usage has stopped growing:

  https://files.polyno.me/tmp/memory-leak-wireguard-annotated.png

> That's a super useful observation! I'm guessing this will fix it:
> https://git.zx2c4.com/WireGuard/commit/?id=c5c22fb9bad1807a612b6055e0049d68f4600605

Nice, thanks!  I'm looking forward to testing the next release then.

> I'm still analyzing everything to find other places where I might have
> missed something, but hopefully the above does it.

Baptiste

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Memleak with 0.0.20171221-5 on Debian stretch
  2018-02-13 13:17             ` Baptiste Jonglez
@ 2018-02-18 20:39               ` Jason A. Donenfeld
  2018-02-22  7:45                 ` Baptiste Jonglez
  0 siblings, 1 reply; 11+ messages in thread
From: Jason A. Donenfeld @ 2018-02-18 20:39 UTC (permalink / raw)
  To: Baptiste Jonglez; +Cc: WireGuard mailing list

> Nice, thanks!  I'm looking forward to testing the next release then.

Let me know if the problem goes away with the snapshot I just released.

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Memleak with 0.0.20171221-5 on Debian stretch
  2018-02-18 20:39               ` Jason A. Donenfeld
@ 2018-02-22  7:45                 ` Baptiste Jonglez
  0 siblings, 0 replies; 11+ messages in thread
From: Baptiste Jonglez @ 2018-02-22  7:45 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

[-- Attachment #1: Type: text/plain, Size: 298 bytes --]

On 18-02-18, Jason A. Donenfeld wrote:
> > Nice, thanks!  I'm looking forward to testing the next release then.
> 
> Let me know if the problem goes away with the snapshot I just released.

It does, thanks!  I am now using 0.0.20180218-1 which does not have the
memleak anymore.

Baptiste

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-02-22  7:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-11 13:48 Memleak with 0.0.20171221-5 on Debian stretch Baptiste Jonglez
2018-02-11 18:20 ` Daniel Kahn Gillmor
2018-02-11 18:43   ` Baptiste Jonglez
2018-02-12  0:23     ` Jason A. Donenfeld
2018-02-12  7:35       ` Baptiste Jonglez
2018-02-12  7:42         ` Baptiste Jonglez
2018-02-12 11:04           ` Jason A. Donenfeld
2018-02-13 13:17             ` Baptiste Jonglez
2018-02-18 20:39               ` Jason A. Donenfeld
2018-02-22  7:45                 ` Baptiste Jonglez
2018-02-12  3:34     ` Daniel Kahn Gillmor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).