Development discussion of WireGuard
 help / color / mirror / Atom feed
* Wireguard Bug?
@ 2019-05-12 13:44 Ryan Whelan
  2019-05-12 15:41 ` Jason A. Donenfeld
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Ryan Whelan @ 2019-05-12 13:44 UTC (permalink / raw)
  To: WireGuard mailing list


[-- Attachment #1.1: Type: text/plain, Size: 6725 bytes --]

I am building a system which coordinates the meshing of wireguard devices.
Currently, all the devices are running on an embedded platform (AMD Geode
LX500) and works as expected.  However, when introducing a 64bit KVM host
for testing, all the 32bit hosts running on the Geode platform, report the
following warning and drop offline temporarily.  The CPU usage on KVM host
spikes and all the CPU time is spent on the kernel threads servicing the WG
interfaces.

I'm using kernel 4.19.41 and have seen the issue with both the last WG
snapshot as well building from master.

Is this a bug in WG?

May 10 18:36:25 buildroot kern.warn kernel: WARNING: CPU: 0 PID: 9 at
kernel/workqueue.c:1442 __queue_work+0x1d4/0x2aa
May 10 18:36:25 buildroot kern.warn kernel: CPU: 0 PID: 9 Comm: ksoftirqd/0
Tainted: G        W         4.19.41-banana #1
May 10 18:36:25 buildroot kern.warn kernel: EIP: __queue_work+0x1d4/0x2aa
May 10 18:36:25 buildroot kern.warn kernel: Code: c1 e8 4e 09 1d 00 c7 05
30 d3 5f c1 00 00 00 00 c7 45 e4 00 00 00 00 e9 76 fe ff ff 8b 3d 80 3c 60
c1 89 78 10 e9 32 ff ff ff <0f> 0b b8 01 00 00 00 e8 1c 82 00 00 a1 a8 9e
5f c1 85 c0 0f 85 3b
May 10 18:36:25 buildroot kern.warn kernel: EAX: 00000000 EBX: cfb63044
ECX: 00000003 EDX: cfb63048
May 10 18:36:25 buildroot kern.warn kernel: ESI: cfdfbb00 EDI: cfab29c0
EBP: cf847d38 ESP: cf847d1c
May 10 18:36:25 buildroot kern.warn kernel: DS: 007b ES: 007b FS: 0000 GS:
00e0 SS: 0068 EFLAGS: 00010086
May 10 18:36:25 buildroot kern.warn kernel: CR0: 80050033 CR2: a7bb5058
CR3: 0eebe000 CR4: 00000090
May 10 18:36:25 buildroot kern.warn kernel: Call Trace:
May 10 18:36:25 buildroot kern.warn kernel:  queue_work_on+0x19/0x1d
May 10 18:36:25 buildroot kern.warn kernel:  wg_packet_receive+0x4a5/0x557
May 10 18:36:25 buildroot kern.warn kernel:  ? send6+0x1fc/0x1fc
May 10 18:36:25 buildroot kern.warn kernel:  wg_receive+0x16/0x25
May 10 18:36:25 buildroot kern.warn kernel:  udp_queue_rcv_skb+0x254/0x324
May 10 18:36:25 buildroot kern.warn kernel:
udp_unicast_rcv_skb.isra.12+0x68/0x7a
May 10 18:36:25 buildroot kern.warn kernel:  __udp4_lib_rcv+0x413/0x88b
May 10 18:36:25 buildroot kern.warn kernel:  udp_rcv+0x12/0x14
May 10 18:36:25 buildroot kern.warn kernel:
ip_local_deliver_finish+0x82/0x1ff
May 10 18:36:25 buildroot kern.warn kernel:  ip_local_deliver+0xc0/0xcb
May 10 18:36:25 buildroot kern.warn kernel:  ?
ip_sublist_rcv_finish+0x41/0x41
May 10 18:36:25 buildroot kern.warn kernel:  ip_rcv_finish+0x24/0x2a
May 10 18:36:25 buildroot kern.warn kernel:  ip_rcv+0xa1/0xaf
May 10 18:36:25 buildroot kern.warn kernel:  ?
ip_rcv_finish_core.isra.0+0x331/0x331
May 10 18:36:25 buildroot kern.warn kernel:
__netif_receive_skb_one_core+0x3f/0x59
May 10 18:36:25 buildroot kern.warn kernel:  __netif_receive_skb+0x16/0x4f
May 10 18:36:25 buildroot kern.warn kernel:
netif_receive_skb_internal+0x26/0xaf
May 10 18:36:25 buildroot kern.warn kernel:  netif_receive_skb+0x8/0xa
May 10 18:36:25 buildroot kern.warn kernel:  rhine_napipoll+0x5ef/0x9c6
May 10 18:36:25 buildroot kern.warn kernel:  net_rx_action+0x197/0x24d
May 10 18:36:25 buildroot kern.warn kernel:  __do_softirq+0xd6/0x1ae
May 10 18:36:25 buildroot kern.warn kernel:  run_ksoftirqd+0x21/0x24
May 10 18:36:25 buildroot kern.warn kernel:  smpboot_thread_fn+0x137/0x1ea
May 10 18:36:25 buildroot kern.warn kernel:  kthread+0xbe/0xea
May 10 18:36:25 buildroot kern.warn kernel:  ? sort_range+0x18/0x18
May 10 18:36:25 buildroot kern.warn kernel:  ?
__kthread_create_on_node+0x13e/0x13e
May 10 18:36:25 buildroot kern.warn kernel:  ret_from_fork+0x19/0x24
May 10 18:36:25 buildroot kern.warn kernel: ---[ end trace c200a14cd22c0ee1
]---
May 10 18:36:25 buildroot kern.warn kernel: WARNING: CPU: 0 PID: 9 at
kernel/workqueue.c:1442 __queue_work+0x1d4/0x2aa
May 10 18:36:25 buildroot kern.warn kernel: CPU: 0 PID: 9 Comm: ksoftirqd/0
Tainted: G        W         4.19.41-banana #1
May 10 18:36:25 buildroot kern.warn kernel: EIP: __queue_work+0x1d4/0x2aa
May 10 18:36:25 buildroot kern.warn kernel: Code: c1 e8 4e 09 1d 00 c7 05
30 d3 5f c1 00 00 00 00 c7 45 e4 00 00 00 00 e9 76 fe ff ff 8b 3d 80 3c 60
c1 89 78 10 e9 32 ff ff ff <0f> 0b b8 01 00 00 00 e8 1c 82 00 00 a1 a8 9e
5f c1 85 c0 0f 85 3b
May 10 18:36:25 buildroot kern.warn kernel: EAX: 00000000 EBX: cfb62044
ECX: 00000003 EDX: cfb62048
May 10 18:36:25 buildroot kern.warn kernel: ESI: cfdfbb00 EDI: cfab29c0
EBP: cf847d38 ESP: cf847d1c
May 10 18:36:25 buildroot kern.warn kernel: DS: 007b ES: 007b FS: 0000 GS:
00e0 SS: 0068 EFLAGS: 00010086
May 10 18:36:25 buildroot kern.warn kernel: CR0: 80050033 CR2: a7bb5058
CR3: 0eebe000 CR4: 00000090
May 10 18:36:25 buildroot kern.warn kernel: Call Trace:
May 10 18:36:25 buildroot kern.warn kernel:  queue_work_on+0x19/0x1d
May 10 18:36:25 buildroot kern.warn kernel:  wg_packet_receive+0x4a5/0x557
May 10 18:36:25 buildroot kern.warn kernel:  ? send6+0x1fc/0x1fc
May 10 18:36:25 buildroot kern.warn kernel:  wg_receive+0x16/0x25
May 10 18:36:25 buildroot kern.warn kernel:  udp_queue_rcv_skb+0x254/0x324
May 10 18:36:25 buildroot kern.warn kernel:
udp_unicast_rcv_skb.isra.12+0x68/0x7a
May 10 18:36:25 buildroot kern.warn kernel:  __udp4_lib_rcv+0x413/0x88b
May 10 18:36:25 buildroot kern.warn kernel:  udp_rcv+0x12/0x14
May 10 18:36:25 buildroot kern.warn kernel:
ip_local_deliver_finish+0x82/0x1ff
May 10 18:36:25 buildroot kern.warn kernel:  ip_local_deliver+0xc0/0xcb
May 10 18:36:25 buildroot kern.warn kernel:  ?
ip_sublist_rcv_finish+0x41/0x41
May 10 18:36:25 buildroot kern.warn kernel:  ip_rcv_finish+0x24/0x2a
May 10 18:36:25 buildroot kern.warn kernel:  ip_rcv+0xa1/0xaf
May 10 18:36:25 buildroot kern.warn kernel:  ?
ip_rcv_finish_core.isra.0+0x331/0x331
May 10 18:36:25 buildroot kern.warn kernel:
__netif_receive_skb_one_core+0x3f/0x59
May 10 18:36:25 buildroot kern.warn kernel:  __netif_receive_skb+0x16/0x4f
May 10 18:36:25 buildroot kern.warn kernel:
netif_receive_skb_internal+0x26/0xaf
May 10 18:36:25 buildroot kern.warn kernel:  netif_receive_skb+0x8/0xa
May 10 18:36:25 buildroot kern.warn kernel:  rhine_napipoll+0x5ef/0x9c6
May 10 18:36:25 buildroot kern.warn kernel:  net_rx_action+0x197/0x24d
May 10 18:36:25 buildroot kern.warn kernel:  __do_softirq+0xd6/0x1ae
May 10 18:36:25 buildroot kern.warn kernel:  run_ksoftirqd+0x21/0x24
May 10 18:36:25 buildroot kern.warn kernel:  smpboot_thread_fn+0x137/0x1ea
May 10 18:36:25 buildroot kern.warn kernel:  kthread+0xbe/0xea
May 10 18:36:25 buildroot kern.warn kernel:  ? sort_range+0x18/0x18
May 10 18:36:25 buildroot kern.warn kernel:  ?
__kthread_create_on_node+0x13e/0x13e
May 10 18:36:25 buildroot kern.warn kernel:  ret_from_fork+0x19/0x24
May 10 18:36:25 buildroot kern.warn kernel: ---[ end trace c200a14cd22c0ee2
]---

[-- Attachment #1.2: Type: text/html, Size: 7700 bytes --]

[-- Attachment #2: Type: text/plain, Size: 148 bytes --]

_______________________________________________
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Wireguard Bug?
  2019-05-12 13:44 Wireguard Bug? Ryan Whelan
@ 2019-05-12 15:41 ` Jason A. Donenfeld
  2019-05-12 23:02 ` Lonnie Abelbeck
  2019-06-14 11:56 ` Jason A. Donenfeld
  2 siblings, 0 replies; 6+ messages in thread
From: Jason A. Donenfeld @ 2019-05-12 15:41 UTC (permalink / raw)
  To: Ryan Whelan; +Cc: WireGuard mailing list

Hey Ryan,

Can you confirm the following?

- You can easily reproduce this in a matter of seconds.
- The stacktrace you sent is from from the 32bit machine.
- The 64bit KVM machine, after triggering the stacktrace on the 32bit
machines, starts using tons of CPU.

Could you also send the .config of the 32bit machine and perhaps any
additional interesting information about the 64bit KVM machine?

Jason
_______________________________________________
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Wireguard Bug?
  2019-05-12 13:44 Wireguard Bug? Ryan Whelan
  2019-05-12 15:41 ` Jason A. Donenfeld
@ 2019-05-12 23:02 ` Lonnie Abelbeck
  2019-06-14 11:56 ` Jason A. Donenfeld
  2 siblings, 0 replies; 6+ messages in thread
From: Lonnie Abelbeck @ 2019-05-12 23:02 UTC (permalink / raw)
  To: Ryan Whelan; +Cc: WireGuard mailing list



> On May 12, 2019, at 8:44 AM, Ryan Whelan <rcwhelan@gmail.com> wrote:
> 
> I am building a system which coordinates the meshing of wireguard devices.  Currently, all the devices are running on an embedded platform (AMD Geode LX500) and works as expected.  However, when introducing a 64bit KVM host for testing, all the 32bit hosts running on the Geode platform, report the following warning and drop offline temporarily.  The CPU usage on KVM host spikes and all the CPU time is spent on the kernel threads servicing the WG interfaces.
> 
> I'm using kernel 4.19.41 and have seen the issue with both the last WG snapshot as well building from master.

Hi Ryan,

Did you mean "AMD Geode LX800 @500 MHz" ? (ex. ALIX/net5501)

If so, I have a couple of those using kernel 3.16.64 (i586) mixed with 3.16.64 (x86_64), and WG works well between them, no issues like you reported.  iperf3 over WG runs at 23.8 Mbits/sec.

Lonnie

_______________________________________________
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Wireguard Bug?
  2019-05-12 13:44 Wireguard Bug? Ryan Whelan
  2019-05-12 15:41 ` Jason A. Donenfeld
  2019-05-12 23:02 ` Lonnie Abelbeck
@ 2019-06-14 11:56 ` Jason A. Donenfeld
  2 siblings, 0 replies; 6+ messages in thread
From: Jason A. Donenfeld @ 2019-06-14 11:56 UTC (permalink / raw)
  To: Ryan Whelan; +Cc: WireGuard mailing list

[-- Attachment #1: Type: text/plain, Size: 141 bytes --]

Hey Ryan,

If you still have a reliable test rig for the bug, would you try
running with you kernel compiled with the attached patch?

Jason

[-- Attachment #2: willitwork.diff --]
[-- Type: text/x-patch, Size: 455 bytes --]

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index d8a2084b88db..3860c09d6ac1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1454,6 +1454,9 @@ bool queue_work_on(int cpu, struct workqueue_struct *wq,
 
 	local_irq_save(flags);
 
+	/* Pair with the smp_wmb() in set_work_pool_and_clear_pending. */
+	smp_rmb();
+
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
 		__queue_work(cpu, wq, work);
 		ret = true;

[-- Attachment #3: Type: text/plain, Size: 148 bytes --]

_______________________________________________
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: WireGuard Bug?
  2019-05-17  6:34 WireGuard Bug? . .
@ 2019-05-18 17:03 ` Lucian Cristian
  0 siblings, 0 replies; 6+ messages in thread
From: Lucian Cristian @ 2019-05-18 17:03 UTC (permalink / raw)
  To: wireguard


[-- Attachment #1.1: Type: text/plain, Size: 1390 bytes --]

On 17.05.2019 09:34, . . wrote:
> Hi,
>
> I am using WireGuard on a Raspberry Pi 3 B+ with Raspbian Stretch and 
> 4.14.98-v7+ kernel.
> Now this works great for me and is very efficient, however I tried to 
> add a lot of routes on one of the "spoke/client" nodes, 517 routes to 
> be exact.
> If I do this, WireGuard stops working, tcpdump shows the traffic being 
> sent out the wg0 interface but never actually being processed by 
> wireguard, meaning the encapsulated packet to the "hub" never leaves.
>
> So I tried doing this with wg instead of wg-quick and this works fine 
> until I actually add a lot of the routes to the routing table, the 
> sweet spot seems to be 384. If 383 routes are present in the routing 
> table, wg will still work but if I add one more, all previously 
> working ones dont anymore, if I reduce it again to <=383 then it 
> starts working again. wg itself doesnt mind having all those routes 
> (wg show) but I wonder if it tries to read the routing table as well 
> for some reason?
>
> Appreciate any insight/help on this, thanks.
> Chris
>
>
>
> _______________________________________________
> WireGuard mailing list
> WireGuard@lists.zx2c4.com
> https://lists.zx2c4.com/mailman/listinfo/wireguard

did you tried using dynamic routing ? or it can't be applied ? I have 
262 routes available so can't confirm if dynamic routing will work


Regards


[-- Attachment #1.2: Type: text/html, Size: 2786 bytes --]

[-- Attachment #2: Type: text/plain, Size: 148 bytes --]

_______________________________________________
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* WireGuard Bug?
@ 2019-05-17  6:34 . .
  2019-05-18 17:03 ` Lucian Cristian
  0 siblings, 1 reply; 6+ messages in thread
From: . . @ 2019-05-17  6:34 UTC (permalink / raw)
  To: wireguard


[-- Attachment #1.1: Type: text/plain, Size: 1033 bytes --]

Hi,

I am using WireGuard on a Raspberry Pi 3 B+ with Raspbian Stretch and 4.14.98-v7+ kernel.
Now this works great for me and is very efficient, however I tried to add a lot of routes on one of the "spoke/client" nodes, 517 routes to be exact.
If I do this, WireGuard stops working, tcpdump shows the traffic being sent out the wg0 interface but never actually being processed by wireguard, meaning the encapsulated packet to the "hub" never leaves.

So I tried doing this with wg instead of wg-quick and this works fine until I actually add a lot of the routes to the routing table, the sweet spot seems to be 384. If 383 routes are present in the routing table, wg will still work but if I add one more, all previously working ones dont anymore, if I reduce it again to <=383 then it starts working again. wg itself doesnt mind having all those routes (wg show) but I wonder if it tries to read the routing table as well for some reason?

Appreciate any insight/help on this, thanks.
Chris

<http://aka.ms/weboutlook>

[-- Attachment #1.2: Type: text/html, Size: 1619 bytes --]

[-- Attachment #2: Type: text/plain, Size: 148 bytes --]

_______________________________________________
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-06-14 11:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-12 13:44 Wireguard Bug? Ryan Whelan
2019-05-12 15:41 ` Jason A. Donenfeld
2019-05-12 23:02 ` Lonnie Abelbeck
2019-06-14 11:56 ` Jason A. Donenfeld
2019-05-17  6:34 WireGuard Bug? . .
2019-05-18 17:03 ` Lucian Cristian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).