From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: from second.openwall.net (second.openwall.net [193.110.157.125]) by inbox.vuxu.org (Postfix) with SMTP id 76B852138B for ; Thu, 30 May 2024 12:18:17 +0200 (CEST) Received: (qmail 1954 invoked by uid 550); 30 May 2024 10:18:12 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 1916 invoked from network); 30 May 2024 10:18:12 -0000 X-Envelope-To: dalias@libc.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=postmarketos.org; s=key1; t=1717064283; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6QZSvMLQSOdlee9xQKYzS2xRKI3quVtM0t0LzwuEz7o=; b=XrXRnCw9JpcmMZXyDWsz9HCDM9xY/fexOoEevvlNQ7m1myUHAa2xd6qIjOhe1FjnQCAg/v sEBDRQtcvi8q5Y0/gw7lRkFKaNFrm84LO2R2N6sRQIGxtIBQWxQsU+ZBXrYS2FxAcEn0hz WW2G/+43s5JJBFZQBXS3Fien1SvxyITYp3KeXyMjHmKtWmyoDXOL0US8w0XZQd7XEjWwMX 0V82Pjjk3i0GlEor37xXo4BbO8QLQDm360vz+eYaSv2q0gMh4B+EbP6sehDTdAIH2nVFmY 8fRmiI6SJVeHkeNckvsFJQUyVcM79OOLdOz/u4HUoKoG3vaGZKJa5ctpmO4Hew== X-Envelope-To: musl@lists.openwall.com Message-ID: <3201c36ee287e6d38e0f3805440a507de8fb52bf.camel@postmarketos.org> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Pablo Correa Gomez To: Rich Felker Cc: musl@lists.openwall.com Date: Thu, 30 May 2024 12:17:59 +0200 In-Reply-To: <20240529131533.GH10433@brightrain.aerifal.cx> References: <20240529131533.GH10433@brightrain.aerifal.cx> Organization: postmarketOS Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Subject: Re: [musl] Crash in kill(..., SIGHUP) when using SA_ONSTACK Hi Rich, thanks a lot for your reply El mie, 29-05-2024 a las 09:15 -0400, Rich Felker escribi=C3=B3: > On Wed, May 29, 2024 at 02:04:25PM +0200, Pablo Correa Gomez wrote: > > Hi everybody, > >=20 > > I am responsible for musl CI in GNOME's GLib, and we have recently > > bumped into a crash that I have been unable to resolve.=20 > >=20 > > https://gitlab.gnome.org/GNOME/glib/- > > /commit/137db219a7266300ffde1aa75d781284fb0807cb > > introduced in GLib an alternate stack by setting the signal action > > SA_ONSTACK if available. However, the tests that were introduced, > > and > > that pass in most other libc's (there's CI for a lot more than just > > glibc and musl) crash in my alpine linux edge installation with > > SIGSEGV > > (stack trace below) while doing: kill (getpid(), SIGHUP) > >=20 > > I have verified that not adding SA_ONSTACK fixes the crash. Would > > anybody have some pointers of what could possibly be going wrong? > > If > > anybody is really interested, the public issue is > > https://gitlab.gnome.org/GNOME/glib/-/issues/3315 > >=20 > > Stack trace > > ------------ > >=20 > > Thread 1 "unix" received signal SIGSEGV, Segmentation fault. > > 0x00007ffff7fa96e8 in __syscall2 (a2=3D1, a1=3D17483, n=3D62) at > > ../arch/x86_64/syscall_arch.h:21 > > warning: 21=C2=A0=C2=A0=C2=A0=C2=A0 ./arch/x86_64/syscall_arch.h: No su= ch file or > > directory > > (gdb) bt > > #0=C2=A0 0x00007ffff7fa96e8 in __syscall2 (a2=3D1, a1=3D17483, n=3D62) = at > > ../arch/x86_64/syscall_arch.h:21 > > #1=C2=A0 kill (pid=3D17483, sig=3Dsig@entry=3D1) at src/signal/kill.c:6 > > #2=C2=A0 0x0000555555556e96 in test_signal (signum=3Dsignum@entry=3D1) = at > > .../glib/tests/unix.c:534 > > #3=C2=A0 0x0000555555557200 in test_signal_alternate_stack (signal=3D1)= at > > .../glib/tests/unix.c:590 > > #4=C2=A0 0x00007ffff7e8f364 in test_case_run (path=3D, > > test_run_name=3D0x55555555d3f0 "/glib-unix/sighup/alternate-stack", > > tc=3D0x55555555db60) at ../glib/gtestutils.c:2988 > > #5=C2=A0 g_test_run_suite_internal (suite=3Dsuite@entry=3D0x55555555da7= 0, > > path=3Dpath@entry=3D0x0) at ../glib/gtestutils.c:3090 > > #6=C2=A0 0x00007ffff7e8f2db in g_test_run_suite_internal > > (suite=3Dsuite@entry=3D0x7ffff7ffee20, path=3Dpath@entry=3D0x0) at > > .../glib/gtestutils.c:3109 > > #7=C2=A0 0x00007ffff7e8f2db in g_test_run_suite_internal > > (suite=3Dsuite@entry=3D0x7ffff7ffede0, path=3Dpath@entry=3D0x0) at > > .../glib/gtestutils.c:3109 > > #8=C2=A0 0x00007ffff7e8f86a in g_test_run_suite > > (suite=3Dsuite@entry=3D0x7ffff7ffede0) at ../glib/gtestutils.c:3189 > > #9=C2=A0 0x00007ffff7e8f8ea in g_test_run () at > > ../glib/gtestutils.c:2275 > > #10 0x00005555555561f7 in main (argc=3D, > > argv=3D > out>) at ../glib/tests/unix.c:910 >=20 > Can you get a disassembly and register dump at the point of crash? (gdb) layout asm 0x7ffff7fa96f9 movslq %esi,%rsi =20 0x7ffff7fa96fc mov $0x3e,%eax =20 0x7ffff7fa9701 syscall =20 >0x7ffff7fa9703 mov %rax,%rdi =20 0x7ffff7fa9706 call 0x7ffff7f7afb7 <__syscall_ret> =20 0x7ffff7fa970b add $0x8,%rsp =20 0x7ffff7fa970f ret =20 0x7ffff7fa9710 test %edi,%edi =20 0x7ffff7fa9712 js 0x7ffff7fa971b =20 0x7ffff7fa9714 neg %edi =20 0x7ffff7fa9716 jmp 0x7ffff7fa96f2 =20 0x7ffff7fa971b sub $0x8,%rsp =20 0x7ffff7fa971f call 0x7ffff7f78bae <__errno_location>=20 0x7ffff7fa9724 movl $0x16,(%rax) =20 0x7ffff7fa972a mov $0xffffffff,%eax =20 0x7ffff7fa972f add $0x8,%rsp =20 0x7ffff7fa9733 ret =20 0x7ffff7fa9734 mov (%rdi),%edi =20 0x7ffff7fa9736 jmp 0x7ffff7fa973b =20 0x7ffff7fa973b push %r15 =20 0x7ffff7fa973d push %r14 =20 0x7ffff7fa973f push %r13 =20 0x7ffff7fa9741 lea 0x51938(%rip),%r13 # 0x7ffff7ffb080 <__stderr_FILE> 0x7ffff7fa9748 push %r12 =20 0x7ffff7fa974a xor %r12d,%r12d =20 0x7ffff7fa974d push %rbp =20 0x7ffff7fa974e push %rbx =20 0x7ffff7fa974f mov %rsi,%rbx =20 0x7ffff7fa9752 sub $0x18,%rsp =20 0x7ffff7fa9756 call 0x7ffff7fb5780 =C2=A0 (gdb) info registers rax 0x0 0 rbx 0x7ffff7f55c30 140737353440304 rcx 0x7ffff7fa9703 140737353783043 rdx 0x0 0 rsi 0x1 1 rdi 0x525e 21086 rbp 0x1 0x1 rsp 0x7fffffffd5d0 0x7fffffffd5d0 r8 0x0 0 r9 0x80 128 r10 0x8 8 r11 0x202 514 r12 0x7ffff7ffdb5c 140737354128220 r13 0x1 1 r14 0x7fffffffd6d0 140737488344784 r15 0x7fffffffd6f0 140737488344816 rip 0x7ffff7fa9703 0x7ffff7fa9703 eflags 0x202 [ IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 fs_base 0x7ffff7ffdb28 140737354128168 gs_base 0x0 0 Does this tell you anything? =20 > I'm not sure if the crashing code is running on the signal stack or > main stack, but here's a thought: is it possible the CI machines are > running on a cpu/kernel with some monster AVX512 or whatever > extension > enabled with register file that doesn't fit in MINSIGSTKSZ? That might be the case. Would explain why I could not reproduce in my 9-year old laptop I was running last month, but I can reproduce it now in a new machine with a 13th Gen Intel(R) Core(TM) i7-1360P > If so, > using sysconf(_SC_MINSIGSTKSZ) (conditional on _SC_MINSIGSTKSZ being > defined) to allocate the alt stack should mitigate the problem. If > doing this, it should probably be allocated by mmap or malloc, since > in principle it could be too large for the caller's stack. >=20 I'll forward this to the maintainers, let's see if we can come up with a solution. Thanks a lot for your feedback! > It's also possible that the kernel may have some weird behavior > deciding if the task is already "running on the alt stack" when the > alt stack is embedded in the normal stack like this. Just getting rid > of that might be worth trying. If so, whether the problem manifests > could be subject to timing of signal delivery (although I would not > expect that for synchronously generated signals like here). >=20 > Rich