Hi, I ran into an issue with Wireguard on Linux 5.4 kernels with the PREEMPT_RT patch, on Ubuntu 18.04. I tried kernels 5.4.47-rt28 and 5.4.82-rt45. Everything is fine until I send actual data to the machine through scp, resulting in the kernel log below stating "BUG: scheduling while atomic". I tried both the latest Ubuntu package (with wireguard-dkms version 1.0.20201112) as well as compiling the kernel module from the latest source from the wireguard-linux-compat repo, with the same result. Since the call trace mentions kernel_fpu_begin, I looked at the code and the issue seems to occur while using SIMD for packet decryption. When I forcibly disable SIMD with this simple bypass: static inline void simd_get(simd_context_t *ctx) { - *ctx = !IS_ENABLED(CONFIG_PREEMPT_RT_BASE) && may_use_simd() ? HAVE_FULL_SIMD : HAVE_NO_SIMD; + *ctx = HAVE_NO_SIMD; } indeed everything works fine again (ignoring the performance hit). I was unable to further pinpoint the issue, unfortunately. Any idea what might be the cause? Best regards, Erik Schuitema === kernel log === 000: BUG: scheduling while atomic: kworker/0:1/15/0x00000002 000: Modules linked in: wireguard(E) ip6_udp_tunnel udp_tunnel intel_rapl_msr 8250_dw nls_iso8859_1 intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass intel_cstate intel_rapl_perf joydev input_leds wmi_bmof intel_wmi_thunderbolt serio_raw snd_hda_codec_hdmi mei_me mei snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core snd_hwdep intel_lpss_pci snd_pcm intel_lpss snd_timer idma64 intel_pch_thermal virt_dma snd soundcore mac_hid acpi_pad ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 sch_fq_codel xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype xt_conntrack ib_iser ip6table_filter rdma_cm ip6_tables iw_cm ib_cm nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat ib_core nf_conntrack_ftp iscsi_tcp libiscsi_tcp nf_conntrack libiscsi nf_defrag_ipv6 scsi_transport_iscsi nf_defrag_ipv4 iptable_filter ip_tables x_tables autofs4 btrfs zstd_compress algif_skcipher af_alg dm_crypt raid10 000: raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid amdgpu i915 gpu_sched ttm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drm_kms_helper syscopyarea nvme sysfillrect sysimgblt fb_sys_fops aesni_intel igb e1000e crypto_simd dca cryptd glue_helper ptp psmouse pps_core nvme_core i2c_algo_bit drm wmi video pinctrl_sunrisepoint 000: Preemption disabled at: 000: [<ffffffff87442203>] kernel_fpu_begin+0x13/0xd0 000: CPU: 0 PID: 15 Comm: kworker/0:1 Tainted: G E 5.4.47-rt28 #1 000: Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0064.2020.1028.1438 10/28/2020 000: Workqueue: wg-crypt-wg0 wg_packet_decrypt_worker [wireguard] 000: Call Trace: 000: dump_stack+0x6f/0x95 000: ? kernel_fpu_begin+0x13/0xd0 000: __schedule_bug+0x78/0xc0 000: __schedule+0x5f3/0x8b0 000: ? task_blocks_on_rt_mutex+0x17c/0x350 000: schedule+0x3d/0xe0 000: rt_spin_lock_slowlock_locked+0x103/0x2e0 000: rt_spin_lock_slowlock+0x57/0x90 000: rt_spin_lock+0x44/0x50 000: ? wg_packet_decrypt_worker+0xea/0x1c0 [wireguard] 000: wg_packet_decrypt_worker+0xff/0x1c0 [wireguard] 000: process_one_work+0x1ee/0x4d0 000: worker_thread+0x34/0x3f0 000: kthread+0x121/0x140 000: ? process_one_work+0x4d0/0x4d0 000: ? kthread_park+0x90/0x90 000: ret_from_fork+0x35/0x40 000: ------------[ cut here ]------------
Hi Erik, Thanks for the report. I've fixed this here: https://git.zx2c4.com/wireguard-linux-compat/commit/?id=8dcc75dbbe0a7b82c7c9a9388a49d1e32723d8a9 This will be part of the next wireguard-linux-compat snapshot release. Jason
Jason A. Donenfeld schreef op 2020-12-19 13:15:
> Hi Erik,
>
> Thanks for the report. I've fixed this here:
> https://git.zx2c4.com/wireguard-linux-compat/commit/?id=8dcc75dbbe0a7b82c7c9a9388a49d1e32723d8a9
> This will be part of the next wireguard-linux-compat snapshot release.
>
> Jason
Thanks for the quick fix!
From your patch, I see that SIMD must be completely disabled for 5.4
PREEMPT_RT kernels.
Is this any different for kernels >=5.6?
Best regards,
Erik
Hi Erik, So far as I can tell, upstream is fine with this. I'd encourage you to move to the newer LTS, 5.10. The compat stuff has always been pretty meh. It was an important step in getting WireGuard bootstrapped, of course, but just look at this horror: https://git.zx2c4.com/wireguard-linux-compat/tree/src/compat/compat.h I'll keep it working as people need, but folks should really really move to the new LTS, now that it's out. I've also backported upstream commit-by-commit to 5.4 (and android 4.19), for stable kernels, as used by Oracle, SUSE, Google, and so on: https://git.zx2c4.com/wireguard-linux/log/?h=backport-5.4.y This too is much preferable to using the compat stuff. Jason
Hi Jason, (Sorry for the delay in my reply..) On 19/12/2020 19:16, Jason A. Donenfeld wrote: > So far as I can tell, upstream is fine with this. I'd encourage you to > move to the newer LTS, 5.10. The compat stuff has always been pretty > meh. It was an important step in getting WireGuard bootstrapped, of > course, but just look at this horror: > > https://git.zx2c4.com/wireguard-linux-compat/tree/src/compat/compat.h I don't have doubts about the upstream code, I was merely wondering whether the performance hit from disabling SIMD is still present in newer kernels (it wasn't immediately obvious to me while browsing the 5.10 source). > I'll keep it working as people need, but folks should really really > move to the new LTS, now that it's out. These efforts are highly appreciated! It's not trivial for me to switch to a new kernel (needs extensive product testing), so I'm happy with the 5.4 patch. But I'll be sure to skip right to 5.10 when moving to a new kernel. Best regards, Erik