soft lockup - may be related to wireguard (backported)

Development discussion of WireGuard
 help / color / mirror / Atom feed

* soft lockup - may be related to wireguard (backported)
@ 2020-05-04  3:55 Wang Jian
  2020-05-04  5:26 ` Jason A. Donenfeld
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Wang Jian @ 2020-05-04  3:55 UTC (permalink / raw)
  To: WireGuard mailing list

The following is kernel log of soft lockup

# uname -a
Linux gw 5.5.0-2-amd64 #1 SMP Debian 5.5.17-1 (2020-04-15) x86_64 GNU/Linux
# cat /sys/module/wireguard/version
1.0.0

This is a linux box run debian testing. I think the wireguard is
backported from 5.6.
The box had a cpu usage spike during the soft lockup, and after that,
the wireguard stopped working, no handshake.
ifdown wg0; ifup wg0 was not capable to recover the wireguard, no handshake.

--
May  4 01:02:17 gw kernel: [27929.504771] rcu: INFO: rcu_sched
self-detected stall on CPU
May  4 01:02:17 gw kernel: [27929.504859] rcu:  2-....: (5582 ticks
this GP) idle=c02/1/0x4000000000000002 softirq=300509/300511 fqs=2470
May  4 01:02:17 gw kernel: [27929.504937]       (t=5250 jiffies g=518541 q=1110)
May  4 01:02:17 gw kernel: [27929.504975] NMI backtrace for cpu 2
May  4 01:02:17 gw kernel: [27929.505010] CPU: 2 PID: 25094 Comm:
kworker/2:1 Not tainted 5.5.0-2-amd64 #1 Debian 5.5.17-1
May  4 01:02:17 gw kernel: [27929.505076] Hardware name:  /DN2800MT,
BIOS MTCDT10N.86A.0169.2013.0923.1101 09/23/2013
May  4 01:02:17 gw kernel: [27929.505152] Workqueue: wg-crypt-wg0
wg_packet_tx_worker [wireguard]
May  4 01:02:17 gw kernel: [27929.505205] Call Trace:
May  4 01:02:17 gw kernel: [27929.505233]  <IRQ>
May  4 01:02:17 gw kernel: [27929.505263]  dump_stack+0x66/0x90
May  4 01:02:17 gw kernel: [27929.505297]  nmi_cpu_backtrace.cold+0x14/0x53
May  4 01:02:17 gw kernel: [27929.505339]  ? lapic_can_unplug_cpu.cold+0x3e/0x3e
May  4 01:02:17 gw kernel: [27929.505381]
nmi_trigger_cpumask_backtrace+0xf6/0xf8
May  4 01:02:17 gw kernel: [27929.505427]  rcu_dump_cpu_stacks+0x8f/0xbd
May  4 01:02:17 gw kernel: [27929.505466]  rcu_sched_clock_irq.cold+0x1b2/0x3ab
May  4 01:02:17 gw kernel: [27929.505509]  update_process_times+0x24/0x50
May  4 01:02:17 gw kernel: [27929.505549]  tick_sched_handle+0x22/0x60
May  4 01:02:17 gw kernel: [27929.505585]  tick_sched_timer+0x38/0x80
May  4 01:02:17 gw kernel: [27929.505620]  ? tick_sched_do_timer+0x60/0x60
May  4 01:02:17 gw kernel: [27929.505660]  __hrtimer_run_queues+0xf6/0x270
May  4 01:02:17 gw kernel: [27929.505700]  hrtimer_interrupt+0x10e/0x240
May  4 01:02:17 gw kernel: [27929.505740]  smp_apic_timer_interrupt+0x6c/0x130
May  4 01:02:17 gw kernel: [27929.505782]  apic_timer_interrupt+0xf/0x20
May  4 01:02:17 gw kernel: [27929.505818]  </IRQ>
May  4 01:02:17 gw kernel: [27929.505843] RIP: 0010:queue_work_on+0x27/0x40
May  4 01:02:17 gw kernel: [27929.505883] Code: 00 00 00 0f 1f 44 00
00 53 9c 58 0f 1f 44 00 00 48 89 c3 fa 66 0f 1f 44 00 00 f0 48 0f ba
2a 00 73 12 45 31 c0 48 89 df 57 9d <0f> 1f 44 00 00 44 89 c0 5b c3 e8
ea fb ff ff 41 b8 01 00 00 00 eb
May  4 01:02:17 gw kernel: [27929.506019] RSP: 0018:ffffb071c2eefaa8
EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
May  4 01:02:17 gw kernel: [27929.506080] RAX: 0000000000000001 RBX:
0000000000000286 RCX: 0000000000000000
May  4 01:02:17 gw kernel: [27929.506135] RDX: 00001966d8343dfd RSI:
0000000000000083 RDI: 0000000000000286
May  4 01:02:17 gw kernel: [27929.506191] RBP: ffff8b61aa1c0000 R08:
0000000000000001 R09: ffff8b61abdad510
May  4 01:02:17 gw kernel: [27929.506246] R10: ffff8b61a467ee98 R11:
ffff8b61abdacb78 R12: ffff8b61abc43000
May  4 01:02:17 gw kernel: [27929.506301] R13: ffff8b60fef63500 R14:
0000000000000003 R15: ffff8b61a9623904
May  4 01:02:17 gw kernel: [27929.506367]
wg_packet_send_staged_packets+0x320/0x5d0 [wireguard]
May  4 01:02:17 gw kernel: [27929.506426]  wg_xmit+0x324/0x490 [wireguard]
May  4 01:02:17 gw kernel: [27929.506469]  dev_hard_start_xmit+0x8d/0x1e0
May  4 01:02:17 gw kernel: [27929.506508]  __dev_queue_xmit+0x721/0x8e0
May  4 01:02:17 gw kernel: [27929.506549]  ip_finish_output2+0x19b/0x590
May  4 01:02:17 gw kernel: [27929.506604]  ? nf_confirm+0xcb/0xf0 [nf_conntrack]
May  4 01:02:17 gw kernel: [27929.506648]  ip_output+0x76/0xf0
May  4 01:02:17 gw kernel: [27929.506681]  ? __ip_finish_output+0x1c0/0x1c0
May  4 01:02:17 gw kernel: [27929.506720]  iptunnel_xmit+0x174/0x210
May  4 01:02:17 gw kernel: [27929.506761]  send4+0x120/0x390 [wireguard]
May  4 01:02:17 gw kernel: [27929.506806]
wg_socket_send_skb_to_peer+0x98/0xb0 [wireguard]
May  4 01:02:17 gw kernel: [27929.506860]
wg_packet_tx_worker+0xa9/0x210 [wireguard]
May  4 01:02:17 gw kernel: [27929.506907]  process_one_work+0x1b4/0x370
May  4 01:02:17 gw kernel: [27929.506944]  worker_thread+0x50/0x3c0
May  4 01:02:17 gw kernel: [27929.506980]  kthread+0xf9/0x130
May  4 01:02:17 gw kernel: [27929.510322]  ? process_one_work+0x370/0x370
May  4 01:02:17 gw kernel: [27929.514059]  ? kthread_park+0x90/0x90
May  4 01:02:17 gw kernel: [27929.517774]  ret_from_fork+0x1f/0x40
May  4 01:02:45 gw kernel: [27957.473746] watchdog: BUG: soft lockup -
CPU#2 stuck for 22s! [kworker/2:1:25094]
May  4 01:02:45 gw kernel: [27957.476765] Modules linked in:
nf_conntrack_netlink sctp wireguard pppoe pppox ppp_generic slhc
curve25519_x86_64 libcurve25519 libchacha20poly1305 chacha_x86_64
poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64
libblake2s_generic libchacha cfg80211 rfkill 8021q garp stp mrp llc
xt_mark xt_set nft_chain_nat xt_MASQUERADE xt_TCPMSS xt_nat nf_nat
nft_counter xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_state xt_conntrack
xt_multiport nft_compat nf_tables ip_set_hash_ip ip_set nfnetlink
binfmt_misc nls_ascii nls_cp437 vfat intel_powerclamp fat coretemp
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio hid_generic
snd_hda_codec_hdmi evdev efi_pstore snd_hda_intel snd_intel_dspcfg
snd_hda_codec snd_hda_core gma500_gfx usbhid snd_hwdep snd_pcm hid
iTCO_wdt snd_timer efivars pcspkr iTCO_vendor_support drm_kms_helper
snd sg watchdog soundcore i2c_algo_bit acpi_cpufreq button tcp_bbr
sch_cake nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 loop drm efivarfs
ip_tables x_tables autofs4
May  4 01:02:45 gw kernel: [27957.476843]  ext4 crc16 mbcache jbd2
btrfs xor zstd_decompress zstd_compress raid6_pq crc32c_generic
libcrc32c sd_mod ahci libahci uhci_hcd ehci_pci libata ehci_hcd e1000e
scsi_mod usbcore ptp i2c_i801 lpc_ich pps_core mfd_core usb_common
video [last unloaded: wireguard]
May  4 01:02:45 gw kernel: [27957.501504] CPU: 2 PID: 25094 Comm:
kworker/2:1 Not tainted 5.5.0-2-amd64 #1 Debian 5.5.17-1
May  4 01:02:45 gw kernel: [27957.505274] Hardware name:  /DN2800MT,
BIOS MTCDT10N.86A.0169.2013.0923.1101 09/23/2013
May  4 01:02:45 gw kernel: [27957.509046] Workqueue: wg-crypt-wg0
wg_packet_tx_worker [wireguard]
May  4 01:02:45 gw kernel: [27957.512801] RIP:
0010:__xfrm_decode_session+0xd/0x800
May  4 01:02:45 gw kernel: [27957.516563] Code: 00 00 00 e8 55 be ce
ff eb 8c e8 be 46 95 ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f
1f 44 00 00 55 48 89 e5 41 57 41 56 <41> 89 ce 41 55 41 54 49 89 fc 53
48 89 f3 48 83 ec 10 83 fa 02 74
May  4 01:02:45 gw kernel: [27957.520497] RSP: 0018:ffffb071c2eefb28
EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
May  4 01:02:45 gw kernel: [27957.524422] RAX: 0000000000000000 RBX:
ffff8b618d13ed80 RCX: 0000000000000000
May  4 01:02:45 gw kernel: [27957.528366] RDX: 0000000000000002 RSI:
ffffb071c2eefb58 RDI: ffff8b60fe81ce00
May  4 01:02:45 gw kernel: [27957.532285] RBP: ffffb071c2eefb38 R08:
0000000000000000 R09: ffff8b60ff8e5f00
May  4 01:02:45 gw kernel: [27957.536216] R10: 0000000000000000 R11:
0000000000000001 R12: ffffffff92ae7f80
May  4 01:02:45 gw kernel: [27957.540097] R13: ffffb071c2eefb58 R14:
0000000000000000 R15: 0000000000000001
May  4 01:02:45 gw kernel: [27957.543929] FS:  0000000000000000(0000)
GS:ffff8b61abd00000(0000) knlGS:0000000000000000
May  4 01:02:45 gw kernel: [27957.547734] CS:  0010 DS: 0000 ES: 0000
CR0: 0000000080050033
May  4 01:02:45 gw kernel: [27957.551506] CR2: 00007fa15c342fb8 CR3:
000000010c80a000 CR4: 00000000000006e0
May  4 01:02:45 gw kernel: [27957.555248] Call Trace:
May  4 01:02:45 gw kernel: [27957.558920]  ip_route_me_harder+0x1ce/0x250
May  4 01:02:45 gw kernel: [27957.562614]
nf_route_table_hook4+0xee/0x110 [nf_tables]
May  4 01:02:45 gw kernel: [27957.566288]  nf_hook_slow+0x40/0xb0
May  4 01:02:45 gw kernel: [27957.569916]  __ip_local_out+0xd4/0x140
May  4 01:02:45 gw kernel: [27957.573500]  ? ip_forward_options.cold+0x23/0x23
May  4 01:02:45 gw kernel: [27957.577098]  ip_local_out+0x18/0x50
May  4 01:02:45 gw kernel: [27957.580669]  iptunnel_xmit+0x174/0x210
May  4 01:02:45 gw kernel: [27957.584210]  send4+0x120/0x390 [wireguard]
May  4 01:02:45 gw kernel: [27957.587726]
wg_socket_send_skb_to_peer+0x98/0xb0 [wireguard]
May  4 01:02:45 gw kernel: [27957.591377]
wg_packet_tx_worker+0xa9/0x210 [wireguard]
May  4 01:02:45 gw kernel: [27957.594986]  process_one_work+0x1b4/0x370
May  4 01:02:45 gw kernel: [27957.598526]  worker_thread+0x50/0x3c0
May  4 01:02:45 gw kernel: [27957.602063]  kthread+0xf9/0x130
May  4 01:02:45 gw kernel: [27957.605583]  ? process_one_work+0x370/0x370
May  4 01:02:45 gw kernel: [27957.609112]  ? kthread_park+0x90/0x90
May  4 01:02:45 gw kernel: [27957.612633]  ret_from_fork+0x1f/0x40

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: soft lockup - may be related to wireguard (backported)
  2020-05-04  3:55 soft lockup - may be related to wireguard (backported) Wang Jian
@ 2020-05-04  5:26 ` Jason A. Donenfeld
  2020-05-04 12:49   ` Wang Jian
  2020-05-04 13:49   ` Alex Xu (Hello71)
  2020-05-04 10:47 ` Serge Belyshev
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 10+ messages in thread
From: Jason A. Donenfeld @ 2020-05-04  5:26 UTC (permalink / raw)
  To: Wang Jian; +Cc: WireGuard mailing list

Are you routing wireguard over wireguard, or something interesting like that?

Is ipsec being used?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: soft lockup - may be related to wireguard (backported)
  2020-05-04  5:26 ` Jason A. Donenfeld
@ 2020-05-04 12:49   ` Wang Jian
  2020-05-04 13:49   ` Alex Xu (Hello71)
  1 sibling, 0 replies; 10+ messages in thread
From: Wang Jian @ 2020-05-04 12:49 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

Jason A. Donenfeld <Jason@zx2c4.com> 于2020年5月4日周一 下午1:26写道：
>
> Are you routing wireguard over wireguard, or something interesting like that?
>
> Is ipsec being used?

I don't think I have any fancy use cases. But wireguard over pppoe?

Other details are
- nftable's iptables compatible mode is used, along with ipset
- pppoe link is default route, and wg-quick is configured to install
additional default route into new created routing table (2000)
- ipset matches are used to MARK traffic to specific destinations in
mangle table, PREROUTING & OUTPUT, for both v4 and v6
- ip rules are added to match the fwmark and lookup new routing table
(2000, so go out via wireguard interface) for forwarded traffic and
output traffic

Here is the list of loaded kmod.

# lsmod
Module                  Size  Used by
pppoe                  20480  2
pppox                  16384  1 pppoe
ppp_generic            49152  6 pppox,pppoe
slhc                   20480  1 ppp_generic
wireguard              94208  0
curve25519_x86_64      49152  1 wireguard
libcurve25519          49152  2 curve25519_x86_64,wireguard
libchacha20poly1305    16384  1 wireguard
chacha_x86_64          28672  1 libchacha20poly1305
poly1305_x86_64        28672  1 libchacha20poly1305
ip6_udp_tunnel         16384  1 wireguard
udp_tunnel             16384  1 wireguard
libblake2s             16384  1 wireguard
blake2s_x86_64         20480  1 libblake2s
libblake2s_generic     20480  1 blake2s_x86_64
libchacha              16384  1 chacha_x86_64
cfg80211              839680  0
rfkill                 28672  1 cfg80211
8021q                  40960  0
garp                   16384  1 8021q
stp                    16384  1 garp
mrp                    20480  1 8021q
llc                    16384  2 stp,garp
xt_mark                16384  4
xt_set                 16384  4
nft_chain_nat          16384  8
xt_MASQUERADE          20480  3
xt_TCPMSS              16384  5
xt_nat                 16384  2
nf_nat                 53248  3 xt_nat,nft_chain_nat,xt_MASQUERADE
nft_counter            16384  25
xt_tcpudp              20480  7
ipt_REJECT             16384  3
nf_reject_ipv4         16384  1 ipt_REJECT
xt_state               16384  0
xt_conntrack           16384  4
xt_multiport           20480  2
nft_compat             20480  34
nf_tables             176128  93 nft_compat,nft_counter,nft_chain_nat
ip_set_hash_ip         45056  2
ip_set_hash_net        53248  3
ip_set                 57344  3 ip_set_hash_ip,xt_set,ip_set_hash_net
nfnetlink              16384  4 nft_compat,nf_tables,ip_set
binfmt_misc            24576  1
intel_powerclamp       20480  0
nls_ascii              16384  1
coretemp               20480  0
snd_hda_codec_realtek   126976  1
nls_cp437              20480  1
vfat                   20480  1
snd_hda_codec_generic    94208  1 snd_hda_codec_realtek
fat                    86016  1 vfat
ledtrig_audio          16384  2 snd_hda_codec_generic,snd_hda_codec_realtek
snd_hda_codec_hdmi     73728  1
snd_hda_intel          53248  0
snd_intel_dspcfg       24576  1 snd_hda_intel
snd_hda_codec         163840  4
snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec_realtek
evdev                  28672  3
gma500_gfx            221184  1
efi_pstore             16384  0
snd_hda_core          102400  5
snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_codec_realtek
snd_hwdep              16384  1 snd_hda_codec
snd_pcm               131072  4
snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_core
drm_kms_helper        233472  1 gma500_gfx
pcspkr                 16384  0
efivars                20480  1 efi_pstore
snd_timer              40960  1 snd_pcm
iTCO_wdt               16384  0
snd                   106496  8
snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_hda_codec,snd_hda_codec_realtek,snd_timer,snd_pcm
sg                     36864  0
iTCO_vendor_support    16384  1 iTCO_wdt
soundcore              16384  1 snd
i2c_algo_bit           16384  1 gma500_gfx
watchdog               28672  1 iTCO_wdt
acpi_cpufreq           28672  1
button                 24576  0
tcp_bbr                20480  36
sch_cake               32768  3
nf_conntrack          172032  5
xt_conntrack,nf_nat,xt_state,xt_nat,xt_MASQUERADE
nf_defrag_ipv6         24576  1 nf_conntrack
nf_defrag_ipv4         16384  1 nf_conntrack
loop                   40960  0
drm                   585728  3 drm_kms_helper,gma500_gfx
efivarfs               16384  1
ip_tables              32768  0
x_tables               53248  12
xt_conntrack,nft_compat,xt_multiport,xt_state,xt_tcpudp,xt_nat,xt_set,ipt_REJECT,ip_tables,xt_MASQUERADE,xt_TCPMSS,xt_mark
autofs4                53248  2
ext4                  765952  1
crc16                  16384  1 ext4
mbcache                16384  1 ext4
jbd2                  135168  1 ext4
btrfs                1478656  0
xor                    24576  1 btrfs
zstd_decompress        90112  1 btrfs
zstd_compress         188416  1 btrfs
raid6_pq              122880  1 btrfs
crc32c_generic         16384  2
libcrc32c              16384  3 nf_conntrack,nf_nat,btrfs
sd_mod                 57344  3
ahci                   40960  2
libahci                45056  1 ahci
libata                290816  2 libahci,ahci
e1000e                290816  0
ehci_pci               20480  0
uhci_hcd               53248  0
scsi_mod              253952  3 sd_mod,libata,sg
ehci_hcd               98304  1 ehci_pci
i2c_i801               32768  0
usbcore               315392  3 ehci_pci,ehci_hcd,uhci_hcd
ptp                    20480  1 e1000e
lpc_ich                28672  0
mfd_core               20480  1 lpc_ich
pps_core               24576  1 ptp
usb_common             16384  3 usbcore,ehci_hcd,uhci_hcd
video                  53248  1 gma500_gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: soft lockup - may be related to wireguard (backported)
  2020-05-04  5:26 ` Jason A. Donenfeld
  2020-05-04 12:49   ` Wang Jian
@ 2020-05-04 13:49   ` Alex Xu (Hello71)
  2020-05-04 16:51     ` Wang Jian
  1 sibling, 1 reply; 10+ messages in thread
From: Alex Xu (Hello71) @ 2020-05-04 13:49 UTC (permalink / raw)
  To: Jason A. Donenfeld, Wang Jian, wireguard

Excerpts from Jason A. Donenfeld's message of May 4, 2020 1:26 am:
> Are you routing wireguard over wireguard, or something interesting like that?
> 
> Is ipsec being used?
> 

This "DN2800MT" looks like an Atom board from 2012; are you trying to 
run a very large bandwidth through it? I think it's plausible that 
buffering could cause a large enough chunk of work for a slow CPU that 
the kernel could erroneously think that the code is stuck. I don't know 
why that would happen on the tx thread though, or why it would result in 
an RCU stall, seeing as only a single access is made under an RCU lock 
in this function...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: soft lockup - may be related to wireguard (backported)
  2020-05-04 13:49   ` Alex Xu (Hello71)
@ 2020-05-04 16:51     ` Wang Jian
  0 siblings, 0 replies; 10+ messages in thread
From: Wang Jian @ 2020-05-04 16:51 UTC (permalink / raw)
  To: Alex Xu (Hello71); +Cc: Jason A. Donenfeld, WireGuard mailing list

Alex Xu (Hello71) <alex_y_xu@yahoo.ca> 于2020年5月4日周一 下午9:49写道：
>
> Excerpts from Jason A. Donenfeld's message of May 4, 2020 1:26 am:
> > Are you routing wireguard over wireguard, or something interesting like that?
> >
> > Is ipsec being used?
> >
>
> This "DN2800MT" looks like an Atom board from 2012; are you trying to
> run a very large bandwidth through it? I think it's plausible that
> buffering could cause a large enough chunk of work for a slow CPU that
> the kernel could erroneously think that the code is stuck. I don't know
> why that would happen on the tx thread though, or why it would result in
> an RCU stall, seeing as only a single access is made under an RCU lock
> in this function...

It's an old Atom CPU but still capable of handling at least 100Mbps
(even with encryption).

I have a prometheus running on this box, so I have monitoring data at
an interval of 15s.

Looking at the monitoring data
1. the event started from 1:02:00am, ended at 1:04:45am
2. peak time was 1:03:15am, for load, cpu usage and memory usage
3. available memory dropped from 3.359GB to 2.714GB at 1:03:15am, and
bounced up gradually
from this point

Seems it's related to wireguard's memory use, a cause or a result.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: soft lockup - may be related to wireguard (backported)
  2020-05-04  3:55 soft lockup - may be related to wireguard (backported) Wang Jian
  2020-05-04  5:26 ` Jason A. Donenfeld
@ 2020-05-04 10:47 ` Serge Belyshev
  2020-05-04 22:55   ` Jason A. Donenfeld
  2020-05-04 22:28 ` Jason A. Donenfeld
  2020-05-05  0:05 ` Jason A. Donenfeld
  3 siblings, 1 reply; 10+ messages in thread
From: Serge Belyshev @ 2020-05-04 10:47 UTC (permalink / raw)
  To: WireGuard mailing list

[-- Attachment #1: Type: text/plain, Size: 3374 bytes --]

Hi! I can reproduce similar RCU stall with a different kernel under
specific conditions on a specific box:

[   54.437636] rcu: INFO: rcu_sched self-detected stall on CPU
[   54.438838] rcu:  0-...!: (2101 ticks this GP) idle=ea6/1/0x4000000000000002 softirq=604/604 fqs=0 
[   54.440052]  (t=2101 jiffies g=69 q=89)
[   54.441273] rcu: rcu_sched kthread starved for 2101 jiffies! g69 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[   54.442547] rcu: RCU grace-period kthread stack dump:
[   54.443812] rcu_sched       I    0    10      2 0x80004000
[   54.443814] Call Trace:
[   54.445087]  ? __schedule+0x540/0xa80
[   54.446356]  schedule+0x45/0xb0
[   54.447612]  schedule_timeout+0x144/0x280
[   54.448859]  ? __next_timer_interrupt+0xc0/0xc0
[   54.450099]  rcu_gp_kthread+0x3f0/0x840
[   54.451329]  kthread+0xe6/0x120
[   54.452557]  ? rcu_gp_slow.part.0+0x30/0x30
[   54.453761]  ? __kthread_create_on_node+0x150/0x150
[   54.454943]  ret_from_fork+0x1f/0x30
[   54.456095] NMI backtrace for cpu 0
[   54.457221] CPU: 0 PID: 2910 Comm: md5sum Not tainted 5.6.0-00001-g6e142c237f00 #1309
[   54.458355] Hardware name: Gigabyte Technology Co., Ltd. GA-MA790FX-DQ6/GA-MA790FX-DQ6, BIOS F7g 07/19/2010
[   54.459484] Call Trace:
[   54.460576]  <IRQ>
[   54.461672]  dump_stack+0x50/0x70
[   54.462772]  nmi_cpu_backtrace.cold+0x14/0x53
[   54.463871]  ? lapic_can_unplug_cpu.cold+0x3e/0x3e
[   54.464955]  nmi_trigger_cpumask_backtrace+0x7c/0x89
[   54.466026]  rcu_dump_cpu_stacks+0x7b/0xa9
[   54.467088]  rcu_sched_clock_irq.cold+0x153/0x38a
[   54.468146]  update_process_times+0x1f/0x50
[   54.469204]  tick_sched_timer+0x33/0x70
[   54.470262]  ? tick_sched_do_timer+0x50/0x50
[   54.471321]  __hrtimer_run_queues+0xe2/0x180
[   54.472378]  hrtimer_interrupt+0x109/0x240
[   54.473423]  smp_apic_timer_interrupt+0x48/0x80
[   54.474461]  apic_timer_interrupt+0xf/0x20
[   54.475486]  </IRQ>
[   54.476495] RIP: 0033:0x556cbd33bf19
[   54.477506] Code: ce 44 8b 4b 10 c1 c9 0f 01 d1 44 89 4c 24 c8 21 ce 31 c6 01 fe 41 8d bc 01 af 0f 7c f5 89 d0 44 8b 4b 3c c1 ce 0a 31 c8 01 ce <21> f0 31 d0 01 f8 41 8d bc 12 2a c6 87 47 89 ca 41 89 ea c1 c0 07
[   54.479694] RSP: 002b:00007ffc30913ce8 EFLAGS: 00000283 ORIG_RAX: ffffffffffffff13
[   54.480813] RAX: 00000000980270bd RBX: 0000556cbe81e4e0 RCX: 00000000c35c3b1a
[   54.481943] RDX: 000000005b5e4ba7 RSI: 00000000ae8ee5ae RDI: 0000000009b5de85
[   54.483075] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   54.484201] R10: 0000000000000000 R11: 00000000b16eb4f8 R12: 0000000000000000
[   54.485317] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000023604445


Details and steps to reproduce:

1. kernel from git://github.com/ckolivas/linux  tag: 5.6-muqss-199
2. kernel .config in the attachment
3. boot with "threadirqs"
4. launch 100% cpu load for all threads, e.g.:  for N in {1..6}; do md5sum /dev/zero & done
5. observe that the box stops responding to pings via wireguard interface.
6. after some time RCU stall may be triggered (but not always).
7. further wireguard configuration details in the attachment.

Note that this is a heisenbug, it disappears with more debugging options are enabled,
I cannot trigger it on mainline kernel or with different scheduler configuration,
and on a different box (skylake laptop) with exactly the same kernel it
is very hard to trigger.


[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 25579 bytes --]

[-- Attachment #3: proc_cpuinfo.gz --]
[-- Type: application/gzip, Size: 775 bytes --]

[-- Attachment #4: wg-config-details.gz --]
[-- Type: application/gzip, Size: 1305 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: soft lockup - may be related to wireguard (backported)
  2020-05-04 10:47 ` Serge Belyshev
@ 2020-05-04 22:55   ` Jason A. Donenfeld
  0 siblings, 0 replies; 10+ messages in thread
From: Jason A. Donenfeld @ 2020-05-04 22:55 UTC (permalink / raw)
  To: Serge Belyshev, WireGuard mailing list

On 5/4/20 4:47 AM, Serge Belyshev wrote:
> Hi! I can reproduce similar RCU stall with a different kernel under
> specific conditions on a specific box:
> 
> [   54.437636] rcu: INFO: rcu_sched self-detected stall on CPU
> [   54.438838] rcu:  0-...!: (2101 ticks this GP) idle=ea6/1/0x4000000000000002 softirq=604/604 fqs=0
> [   54.440052]  (t=2101 jiffies g=69 q=89)
> [   54.441273] rcu: rcu_sched kthread starved for 2101 jiffies! g69 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
> [   54.442547] rcu: RCU grace-period kthread stack dump:
> [   54.443812] rcu_sched       I    0    10      2 0x80004000
> [   54.443814] Call Trace:
> [   54.445087]  ? __schedule+0x540/0xa80
> [   54.446356]  schedule+0x45/0xb0
> [   54.447612]  schedule_timeout+0x144/0x280
> [   54.448859]  ? __next_timer_interrupt+0xc0/0xc0
> [   54.450099]  rcu_gp_kthread+0x3f0/0x840
> [   54.451329]  kthread+0xe6/0x120
> [   54.452557]  ? rcu_gp_slow.part.0+0x30/0x30
> [   54.453761]  ? __kthread_create_on_node+0x150/0x150
> [   54.454943]  ret_from_fork+0x1f/0x30
> [   54.456095] NMI backtrace for cpu 0
> [   54.457221] CPU: 0 PID: 2910 Comm: md5sum Not tainted 5.6.0-00001-g6e142c237f00 #1309
> [   54.458355] Hardware name: Gigabyte Technology Co., Ltd. GA-MA790FX-DQ6/GA-MA790FX-DQ6, BIOS F7g 07/19/2010
> [   54.459484] Call Trace:
> [   54.460576]  <IRQ>
> [   54.461672]  dump_stack+0x50/0x70
> [   54.462772]  nmi_cpu_backtrace.cold+0x14/0x53
> [   54.463871]  ? lapic_can_unplug_cpu.cold+0x3e/0x3e
> [   54.464955]  nmi_trigger_cpumask_backtrace+0x7c/0x89
> [   54.466026]  rcu_dump_cpu_stacks+0x7b/0xa9
> [   54.467088]  rcu_sched_clock_irq.cold+0x153/0x38a
> [   54.468146]  update_process_times+0x1f/0x50
> [   54.469204]  tick_sched_timer+0x33/0x70
> [   54.470262]  ? tick_sched_do_timer+0x50/0x50
> [   54.471321]  __hrtimer_run_queues+0xe2/0x180
> [   54.472378]  hrtimer_interrupt+0x109/0x240
> [   54.473423]  smp_apic_timer_interrupt+0x48/0x80
> [   54.474461]  apic_timer_interrupt+0xf/0x20
> [   54.475486]  </IRQ>
> [   54.476495] RIP: 0033:0x556cbd33bf19
> [   54.477506] Code: ce 44 8b 4b 10 c1 c9 0f 01 d1 44 89 4c 24 c8 21 ce 31 c6 01 fe 41 8d bc 01 af 0f 7c f5 89 d0 44 8b 4b 3c c1 ce 0a 31 c8 01 ce <21> f0 31 d0 01 f8 41 8d bc 12 2a c6 87 47 89 ca 41 89 ea c1 c0 07
> [   54.479694] RSP: 002b:00007ffc30913ce8 EFLAGS: 00000283 ORIG_RAX: ffffffffffffff13
> [   54.480813] RAX: 00000000980270bd RBX: 0000556cbe81e4e0 RCX: 00000000c35c3b1a
> [   54.481943] RDX: 000000005b5e4ba7 RSI: 00000000ae8ee5ae RDI: 0000000009b5de85
> [   54.483075] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [   54.484201] R10: 0000000000000000 R11: 00000000b16eb4f8 R12: 0000000000000000
> [   54.485317] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000023604445
> 

I don't see anything wireguard-related in this stacktrace. Can you try 
sending one that has something wireguard-related in it? Or is more complete?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: soft lockup - may be related to wireguard (backported)
  2020-05-04  3:55 soft lockup - may be related to wireguard (backported) Wang Jian
  2020-05-04  5:26 ` Jason A. Donenfeld
  2020-05-04 10:47 ` Serge Belyshev
@ 2020-05-04 22:28 ` Jason A. Donenfeld
  2020-05-05  6:54   ` Wang Jian
  2020-05-05  0:05 ` Jason A. Donenfeld
  3 siblings, 1 reply; 10+ messages in thread
From: Jason A. Donenfeld @ 2020-05-04 22:28 UTC (permalink / raw)
  To: Wang Jian, WireGuard mailing list

On 5/4/20 6:49 AM, Wang Jian wrote:
 > Jason A. Donenfeld <Jason@zx2c4.com> 于2020年5月4日周一 下午1:26写道：
 >>
 >> Are you routing wireguard over wireguard, or something interesting 
like that?
 >>
 >> Is ipsec being used?
 >
 > I don't think I have any fancy use cases. But wireguard over pppoe?
 >
 > Other details are
 > - nftable's iptables compatible mode is used, along with ipset
 > - pppoe link is default route, and wg-quick is configured to install
 > additional default route into new created routing table (2000)
 > - ipset matches are used to MARK traffic to specific destinations in
 > mangle table, PREROUTING & OUTPUT, for both v4 and v6
 > - ip rules are added to match the fwmark and lookup new routing table
 > (2000, so go out via wireguard interface) for forwarded traffic and
 > output traffic
Can you send full networking configuration in enough detail that I'll be 
able to reliably reproduce this problem? If I can't reproduce it, it's 
unlikely I'll be able to fix it.

Meanwhile, it really really looks from your stacktrace that you have one 
wireguard interface going over another wireguard interface:

[27929.506367]  wg_packet_send_staged_packets+0x320/0x5d0 [wireguard]
[27929.506426]  wg_xmit+0x324/0x490 [wireguard]
[27929.506469]  dev_hard_start_xmit+0x8d/0x1e0
[27929.506508]  __dev_queue_xmit+0x721/0x8e0
[27929.506549]  ip_finish_output2+0x19b/0x590
[27929.506604]  ? nf_confirm+0xcb/0xf0 [nf_conntrack]
[27929.506648]  ip_output+0x76/0xf0
[27929.506681]  ? __ip_finish_output+0x1c0/0x1c0
[27929.506720]  iptunnel_xmit+0x174/0x210
[27929.506761]  send4+0x120/0x390 [wireguard]
[27929.506806]  wg_socket_send_skb_to_peer+0x98/0xb0 [wireguard]
[27929.506860]  wg_packet_tx_worker+0xa9/0x210 [wireguard]

Here, a wireguard encrypted udp packet is being sent to an endpoint that 
then is being routed to a wireguard interface. What in your network 
config would make that possible?

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: soft lockup - may be related to wireguard (backported)
  2020-05-04 22:28 ` Jason A. Donenfeld
@ 2020-05-05  6:54   ` Wang Jian
  0 siblings, 0 replies; 10+ messages in thread
From: Wang Jian @ 2020-05-05  6:54 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

Jason A. Donenfeld <Jason@zx2c4.com> 于2020年5月5日周二 上午6:28写道：
> Can you send full networking configuration in enough detail that I'll be
> able to reliably reproduce this problem? If I can't reproduce it, it's
> unlikely I'll be able to fix it.

I will send full networking configuration to you privately. The
configuration works
and the problem occured only once.

>
> Meanwhile, it really really looks from your stacktrace that you have one
> wireguard interface going over another wireguard interface:
>
> [27929.506367]  wg_packet_send_staged_packets+0x320/0x5d0 [wireguard]
> [27929.506426]  wg_xmit+0x324/0x490 [wireguard]
> [27929.506469]  dev_hard_start_xmit+0x8d/0x1e0
> [27929.506508]  __dev_queue_xmit+0x721/0x8e0
> [27929.506549]  ip_finish_output2+0x19b/0x590
> [27929.506604]  ? nf_confirm+0xcb/0xf0 [nf_conntrack]
> [27929.506648]  ip_output+0x76/0xf0
> [27929.506681]  ? __ip_finish_output+0x1c0/0x1c0
> [27929.506720]  iptunnel_xmit+0x174/0x210
> [27929.506761]  send4+0x120/0x390 [wireguard]
> [27929.506806]  wg_socket_send_skb_to_peer+0x98/0xb0 [wireguard]
> [27929.506860]  wg_packet_tx_worker+0xa9/0x210 [wireguard]
>
> Here, a wireguard encrypted udp packet is being sent to an endpoint that
> then is being routed to a wireguard interface. What in your network
> config would make that possible?

Maybe some erroneous ip route/rule magic that I don't notice.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: soft lockup - may be related to wireguard (backported)
  2020-05-04  3:55 soft lockup - may be related to wireguard (backported) Wang Jian
                   ` (2 preceding siblings ...)
  2020-05-04 22:28 ` Jason A. Donenfeld
@ 2020-05-05  0:05 ` Jason A. Donenfeld
  3 siblings, 0 replies; 10+ messages in thread
From: Jason A. Donenfeld @ 2020-05-05  0:05 UTC (permalink / raw)
  To: Wang Jian, WireGuard mailing list; +Cc: sultan

Sultan had a nice stroke of insight. Let's see if this fixes the issue:
https://git.zx2c4.com/wireguard-linux/commit/?h=stable&id=e21300c15cd8365017be856f944efa4db7b1cb95

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-05-05  6:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-04  3:55 soft lockup - may be related to wireguard (backported) Wang Jian
2020-05-04  5:26 ` Jason A. Donenfeld
2020-05-04 12:49   ` Wang Jian
2020-05-04 13:49   ` Alex Xu (Hello71)
2020-05-04 16:51     ` Wang Jian
2020-05-04 10:47 ` Serge Belyshev
2020-05-04 22:55   ` Jason A. Donenfeld
2020-05-04 22:28 ` Jason A. Donenfeld
2020-05-05  6:54   ` Wang Jian
2020-05-05  0:05 ` Jason A. Donenfeld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).