From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: paul@mjr.org Received: from krantz.zx2c4.com (localhost [127.0.0.1]) by krantz.zx2c4.com (ZX2C4 Mail Server) with ESMTP id d9dbf48e for ; Tue, 12 Jun 2018 21:34:48 +0000 (UTC) Received: from mjr.org (mjr.org [212.13.216.238]) by krantz.zx2c4.com (ZX2C4 Mail Server) with ESMTP id 82915f8f for ; Tue, 12 Jun 2018 21:34:48 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mjr.org (Postfix) with ESMTP id 1A18416C386 for ; Tue, 12 Jun 2018 22:38:36 +0100 (BST) Received: from mjr.org ([127.0.0.1]) by localhost (mjr.org [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 3Hat1Iy_SPkc for ; Tue, 12 Jun 2018 22:38:36 +0100 (BST) Received: from brix.local (cpc98556-croy24-2-0-cust77.19-2.cable.virginm.net [82.34.227.78]) by mjr.org (Postfix) with ESMTP id EC3D916C385 for ; Tue, 12 Jun 2018 22:38:35 +0100 (BST) Message-ID: <303dc8d833bd9b9e57c3c013c37c321a3dd31280.camel@mjr.org> Subject: Re: Kernel lockup with (debian) 4.16.0-2-rt-amd64 From: Paul Hedderly To: WireGuard mailing list Date: Tue, 12 Jun 2018 22:38:35 +0100 In-Reply-To: <46bd903565f6b1114b1d9f6bafa7db77bf3b5090.camel@mjr.org> References: <46bd903565f6b1114b1d9f6bafa7db77bf3b5090.camel@mjr.org> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 List-Id: Development discussion of WireGuard List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Tue, 2018-06-12 at 21:00 +0100, Paul Hedderly wrote: > Loving wireguard but I'm getting failures running the Debian realtime > kernel. I first noticed that the wg link was freezing for 20-30 > seconds > at a time, and then the machine would freeze. Sorry I meant to add this info: prh@brix:~$ sudo modinfo wireguard filename: /lib/modules/4.16.0-2-rt- amd64/updates/dkms/wireguard.ko alias: net-pf-16-proto-16-family-wireguard alias: rtnl-link-wireguard version: 0.0.20180531-1 author: Jason A. Donenfeld description: Fast, secure, and modern VPN tunnel license: GPL v2 srcversion: 6ED5AE02FC2B8D8E9EA3A3D depends: udp_tunnel,ip6_udp_tunnel retpoline: Y name: wireguard vermagic: 4.16.0-2-rt-amd64 SMP preempt mod_unload modversions prh@brix:~$ dpkg -l|grep wireg ii wireguard 0.0.20180531- 1 all fast, modern, secure kernel VPN tunnel (metapackage) ii wireguard-dkms 0.0.20180531- 1 all fast, modern, secure kernel VPN tunnel (DKMS version) ii wireguard-tools 0.0.20180531- 1 amd64 fast, modern, secure kernel VPN tunnel (userland utilities) > I think that is the latest release. I can raise this with the realtime folk too if that would help - I'm not sure where the problem would lie really. Thanks > For example now, before the innevitable freeze: > > http://dpaste.com/1WFGS46 > > from 3820.516865 seconds in > > [ 3820.516865] BUG: scheduling while atomic: > kworker/1:2/17295/0x00000002 > [ 3820.516865] Modules linked ... > [ 3820.516926] Preemption disabled at: > [ 3820.516932] [] kernel_fpu_begin+0xf/0x20 > [ 3820.516934] CPU: 1 PID: 17295 Comm: kworker/1:2 Tainted: > G U O 4.16.0-2-rt-amd64 #1 Debian 4.16.12-1 > [ 3820.516935] Hardware name: Dell Inc. PowerEdge T20/0VD5HY, BIOS > A06 > 01/27/2015 > [ 3820.516940] Workqueue: wg-crypt-wg0 packet_encrypt_worker > [wireguard] > [ 3820.516940] Call Trace: > [ 3820.516946] dump_stack+0x5c/0x85 > [ 3820.516948] ? kernel_fpu_begin+0xf/0x20 > [ 3820.516950] __schedule_bug+0x73/0xc0 > [ 3820.516953] __schedule+0x5a1/0x6e0 > > > Is there any more info needed? I think I'm going to drop the rt > kernel > for now because I've had 4 lockups in 24hrs (since moving to the rt > kernel) > > Is this a known problem? I'm guessing that wg hasnt been tested much > with the rt patchset. > > With a previous freeze it was preceeded by thousands of : > > Jun 12 18:11:40 brix /usr/lib/gdm3/gdm-x-session[9135]: ERROR > block_reap:328: [bandwidth] bad exit code 1 > Jun 12 18:11:45 brix /usr/lib/gdm3/gdm-x-session[9135]: ERROR > block_reap:328: [bandwidth] bad exit code 1 > Jun 12 18:11:50 brix /usr/lib/gdm3/gdm-x-session[9135]: ERROR > block_reap:328: [bandwidth] bad exit code 1 > Jun 12 18:11:55 brix /usr/lib/gdm3/gdm-x-session[9135]: ERROR > block_reap:328: [bandwidth] bad exit code 1 > Jun 12 18:12:00 brix /usr/lib/gdm3/gdm-x-session[9135]: ERROR > block_reap:328: [bandwidth] bad exit code 1 > Jun 12 18:12:05 brix /usr/lib/gdm3/gdm-x-session[9135]: ERROR > block_reap:328: [bandwidth] bad exit code 1 > Jun 12 18:12:10 brix /usr/lib/gdm3/gdm-x-session[9135]: ERROR > block_reap:328: [bandwidth] bad exit code 1 > Jun 12 18:12:15 brix /usr/lib/gdm3/gdm-x-session[9135]: ERROR > block_reap:328: [bandwidth] bad exit code 1 > > then: > > Jun 12 18:16:01 brix kernel: [16507.893206] CPU: 2 PID: 18331 Comm: > kworker/2:2 Tainted: G U O 4.16.0-2-rt-amd64 #1 Debian > 4.16.12-1 > Jun 12 18:16:01 brix kernel: [16507.893206] Hardware name: Dell Inc. > PowerEdge T20/0VD5HY, BIOS A06 01/27/2015 > Jun 12 18:16:01 brix kernel: [16507.893211] Workqueue: wg-crypt-wg0 > packet_encrypt_worker [wireguard] > Jun 12 18:16:01 brix kernel: [16507.893212] Call Trace: > Jun 12 18:16:01 brix kernel: [16507.893218] dump_stack+0x5c/0x85 > Jun 12 18:16:01 brix kernel: [16507.893220] ? > kernel_fpu_begin+0xf/0x20 > Jun 12 18:16:01 brix kernel: [16507.893222] __schedule_bug+0x73/0xc0 > Jun 12 18:16:01 brix kernel: [16507.893224] __schedule+0x5a1/0x6e0 > > And this was all interspersed with the network going up and down. A > log > of the previous failure: > > https://pastebin.com/eFPHXaYk > > Many thanks. > > _______________________________________________ > WireGuard mailing list > WireGuard@lists.zx2c4.com > https://lists.zx2c4.com/mailman/listinfo/wireguard