Development discussion of WireGuard
 help / color / mirror / Atom feed
From: Serge Belyshev <belyshev@depni.sinp.msu.ru>
To: WireGuard mailing list <wireguard@lists.zx2c4.com>
Subject: Re: soft lockup - may be related to wireguard (backported)
Date: Mon, 04 May 2020 13:47:17 +0300	[thread overview]
Message-ID: <878si8564q.fsf@depni.sinp.msu.ru> (raw)
In-Reply-To: <CAF75rJBKTbaK6CEQcmto=YcgA5NGrG85jSvSrYZpQV-L1xFMww@mail.gmail.com> (Wang Jian's message of "Mon, 4 May 2020 11:55:35 +0800")

[-- Attachment #1: Type: text/plain, Size: 3374 bytes --]

Hi! I can reproduce similar RCU stall with a different kernel under
specific conditions on a specific box:

[   54.437636] rcu: INFO: rcu_sched self-detected stall on CPU
[   54.438838] rcu:  0-...!: (2101 ticks this GP) idle=ea6/1/0x4000000000000002 softirq=604/604 fqs=0 
[   54.440052]  (t=2101 jiffies g=69 q=89)
[   54.441273] rcu: rcu_sched kthread starved for 2101 jiffies! g69 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[   54.442547] rcu: RCU grace-period kthread stack dump:
[   54.443812] rcu_sched       I    0    10      2 0x80004000
[   54.443814] Call Trace:
[   54.445087]  ? __schedule+0x540/0xa80
[   54.446356]  schedule+0x45/0xb0
[   54.447612]  schedule_timeout+0x144/0x280
[   54.448859]  ? __next_timer_interrupt+0xc0/0xc0
[   54.450099]  rcu_gp_kthread+0x3f0/0x840
[   54.451329]  kthread+0xe6/0x120
[   54.452557]  ? rcu_gp_slow.part.0+0x30/0x30
[   54.453761]  ? __kthread_create_on_node+0x150/0x150
[   54.454943]  ret_from_fork+0x1f/0x30
[   54.456095] NMI backtrace for cpu 0
[   54.457221] CPU: 0 PID: 2910 Comm: md5sum Not tainted 5.6.0-00001-g6e142c237f00 #1309
[   54.458355] Hardware name: Gigabyte Technology Co., Ltd. GA-MA790FX-DQ6/GA-MA790FX-DQ6, BIOS F7g 07/19/2010
[   54.459484] Call Trace:
[   54.460576]  <IRQ>
[   54.461672]  dump_stack+0x50/0x70
[   54.462772]  nmi_cpu_backtrace.cold+0x14/0x53
[   54.463871]  ? lapic_can_unplug_cpu.cold+0x3e/0x3e
[   54.464955]  nmi_trigger_cpumask_backtrace+0x7c/0x89
[   54.466026]  rcu_dump_cpu_stacks+0x7b/0xa9
[   54.467088]  rcu_sched_clock_irq.cold+0x153/0x38a
[   54.468146]  update_process_times+0x1f/0x50
[   54.469204]  tick_sched_timer+0x33/0x70
[   54.470262]  ? tick_sched_do_timer+0x50/0x50
[   54.471321]  __hrtimer_run_queues+0xe2/0x180
[   54.472378]  hrtimer_interrupt+0x109/0x240
[   54.473423]  smp_apic_timer_interrupt+0x48/0x80
[   54.474461]  apic_timer_interrupt+0xf/0x20
[   54.475486]  </IRQ>
[   54.476495] RIP: 0033:0x556cbd33bf19
[   54.477506] Code: ce 44 8b 4b 10 c1 c9 0f 01 d1 44 89 4c 24 c8 21 ce 31 c6 01 fe 41 8d bc 01 af 0f 7c f5 89 d0 44 8b 4b 3c c1 ce 0a 31 c8 01 ce <21> f0 31 d0 01 f8 41 8d bc 12 2a c6 87 47 89 ca 41 89 ea c1 c0 07
[   54.479694] RSP: 002b:00007ffc30913ce8 EFLAGS: 00000283 ORIG_RAX: ffffffffffffff13
[   54.480813] RAX: 00000000980270bd RBX: 0000556cbe81e4e0 RCX: 00000000c35c3b1a
[   54.481943] RDX: 000000005b5e4ba7 RSI: 00000000ae8ee5ae RDI: 0000000009b5de85
[   54.483075] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   54.484201] R10: 0000000000000000 R11: 00000000b16eb4f8 R12: 0000000000000000
[   54.485317] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000023604445


Details and steps to reproduce:

1. kernel from git://github.com/ckolivas/linux  tag: 5.6-muqss-199
2. kernel .config in the attachment
3. boot with "threadirqs"
4. launch 100% cpu load for all threads, e.g.:  for N in {1..6}; do md5sum /dev/zero & done
5. observe that the box stops responding to pings via wireguard interface.
6. after some time RCU stall may be triggered (but not always).
7. further wireguard configuration details in the attachment.

Note that this is a heisenbug, it disappears with more debugging options are enabled,
I cannot trigger it on mainline kernel or with different scheduler configuration,
and on a different box (skylake laptop) with exactly the same kernel it
is very hard to trigger.


[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 25579 bytes --]

[-- Attachment #3: proc_cpuinfo.gz --]
[-- Type: application/gzip, Size: 775 bytes --]

[-- Attachment #4: wg-config-details.gz --]
[-- Type: application/gzip, Size: 1305 bytes --]

  parent reply	other threads:[~2020-05-04 11:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04  3:55 Wang Jian
2020-05-04  5:26 ` Jason A. Donenfeld
2020-05-04 12:49   ` Wang Jian
2020-05-04 13:49   ` Alex Xu (Hello71)
2020-05-04 16:51     ` Wang Jian
2020-05-04 10:47 ` Serge Belyshev [this message]
2020-05-04 22:55   ` Jason A. Donenfeld
2020-05-04 22:28 ` Jason A. Donenfeld
2020-05-05  6:54   ` Wang Jian
2020-05-05  0:05 ` Jason A. Donenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878si8564q.fsf@depni.sinp.msu.ru \
    --to=belyshev@depni.sinp.msu.ru \
    --cc=wireguard@lists.zx2c4.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).