From: Pablo Correa Gomez <pabloyoyoista@postmarketos.org>
To: Rich Felker <dalias@libc.org>
Cc: musl@lists.openwall.com
Subject: Re: [musl] Crash in kill(..., SIGHUP) when using SA_ONSTACK
Date: Thu, 30 May 2024 12:17:59 +0200 [thread overview]
Message-ID: <3201c36ee287e6d38e0f3805440a507de8fb52bf.camel@postmarketos.org> (raw)
In-Reply-To: <20240529131533.GH10433@brightrain.aerifal.cx>
Hi Rich, thanks a lot for your reply
El mie, 29-05-2024 a las 09:15 -0400, Rich Felker escribió:
> On Wed, May 29, 2024 at 02:04:25PM +0200, Pablo Correa Gomez wrote:
> > Hi everybody,
> >
> > I am responsible for musl CI in GNOME's GLib, and we have recently
> > bumped into a crash that I have been unable to resolve.
> >
> > https://gitlab.gnome.org/GNOME/glib/-
> > /commit/137db219a7266300ffde1aa75d781284fb0807cb
> > introduced in GLib an alternate stack by setting the signal action
> > SA_ONSTACK if available. However, the tests that were introduced,
> > and
> > that pass in most other libc's (there's CI for a lot more than just
> > glibc and musl) crash in my alpine linux edge installation with
> > SIGSEGV
> > (stack trace below) while doing: kill (getpid(), SIGHUP)
> >
> > I have verified that not adding SA_ONSTACK fixes the crash. Would
> > anybody have some pointers of what could possibly be going wrong?
> > If
> > anybody is really interested, the public issue is
> > https://gitlab.gnome.org/GNOME/glib/-/issues/3315
> >
> > Stack trace
> > ------------
> >
> > Thread 1 "unix" received signal SIGSEGV, Segmentation fault.
> > 0x00007ffff7fa96e8 in __syscall2 (a2=1, a1=17483, n=62) at
> > ../arch/x86_64/syscall_arch.h:21
> > warning: 21 ./arch/x86_64/syscall_arch.h: No such file or
> > directory
> > (gdb) bt
> > #0 0x00007ffff7fa96e8 in __syscall2 (a2=1, a1=17483, n=62) at
> > ../arch/x86_64/syscall_arch.h:21
> > #1 kill (pid=17483, sig=sig@entry=1) at src/signal/kill.c:6
> > #2 0x0000555555556e96 in test_signal (signum=signum@entry=1) at
> > .../glib/tests/unix.c:534
> > #3 0x0000555555557200 in test_signal_alternate_stack (signal=1) at
> > .../glib/tests/unix.c:590
> > #4 0x00007ffff7e8f364 in test_case_run (path=<optimized out>,
> > test_run_name=0x55555555d3f0 "/glib-unix/sighup/alternate-stack",
> > tc=0x55555555db60) at ../glib/gtestutils.c:2988
> > #5 g_test_run_suite_internal (suite=suite@entry=0x55555555da70,
> > path=path@entry=0x0) at ../glib/gtestutils.c:3090
> > #6 0x00007ffff7e8f2db in g_test_run_suite_internal
> > (suite=suite@entry=0x7ffff7ffee20, path=path@entry=0x0) at
> > .../glib/gtestutils.c:3109
> > #7 0x00007ffff7e8f2db in g_test_run_suite_internal
> > (suite=suite@entry=0x7ffff7ffede0, path=path@entry=0x0) at
> > .../glib/gtestutils.c:3109
> > #8 0x00007ffff7e8f86a in g_test_run_suite
> > (suite=suite@entry=0x7ffff7ffede0) at ../glib/gtestutils.c:3189
> > #9 0x00007ffff7e8f8ea in g_test_run () at
> > ../glib/gtestutils.c:2275
> > #10 0x00005555555561f7 in main (argc=<optimized out>,
> > argv=<optimized
> > out>) at ../glib/tests/unix.c:910
>
> Can you get a disassembly and register dump at the point of crash?
(gdb) layout asm
0x7ffff7fa96f9 <kill+7> movslq %esi,%rsi
0x7ffff7fa96fc <kill+10> mov $0x3e,%eax
0x7ffff7fa9701 <kill+15> syscall
>0x7ffff7fa9703 <kill+17> mov %rax,%rdi
0x7ffff7fa9706 <kill+20> call 0x7ffff7f7afb7 <__syscall_ret>
0x7ffff7fa970b <kill+25> add $0x8,%rsp
0x7ffff7fa970f <kill+29> ret
0x7ffff7fa9710 <killpg> test %edi,%edi
0x7ffff7fa9712 <killpg+2> js 0x7ffff7fa971b <killpg+11>
0x7ffff7fa9714 <killpg+4> neg %edi
0x7ffff7fa9716 <killpg+6> jmp 0x7ffff7fa96f2 <kill>
0x7ffff7fa971b <killpg+11> sub $0x8,%rsp
0x7ffff7fa971f <killpg+15> call 0x7ffff7f78bae <__errno_location>
0x7ffff7fa9724 <killpg+20> movl $0x16,(%rax)
0x7ffff7fa972a <killpg+26> mov $0xffffffff,%eax
0x7ffff7fa972f <killpg+31> add $0x8,%rsp
0x7ffff7fa9733 <killpg+35> ret
0x7ffff7fa9734 <psiginfo> mov (%rdi),%edi
0x7ffff7fa9736 <psiginfo+2> jmp 0x7ffff7fa973b <psignal>
0x7ffff7fa973b <psignal> push %r15
0x7ffff7fa973d <psignal+2> push %r14
0x7ffff7fa973f <psignal+4> push %r13
0x7ffff7fa9741 <psignal+6> lea 0x51938(%rip),%r13 # 0x7ffff7ffb080
<__stderr_FILE>
0x7ffff7fa9748 <psignal+13> push %r12
0x7ffff7fa974a <psignal+15> xor %r12d,%r12d
0x7ffff7fa974d <psignal+18> push %rbp
0x7ffff7fa974e <psignal+19> push %rbx
0x7ffff7fa974f <psignal+20> mov %rsi,%rbx
0x7ffff7fa9752 <psignal+23> sub $0x18,%rsp
0x7ffff7fa9756 <psignal+27> call 0x7ffff7fb5780 <strsignal>
(gdb) info registers
rax 0x0 0
rbx 0x7ffff7f55c30 140737353440304
rcx 0x7ffff7fa9703 140737353783043
rdx 0x0 0
rsi 0x1 1
rdi 0x525e 21086
rbp 0x1 0x1
rsp 0x7fffffffd5d0 0x7fffffffd5d0
r8 0x0 0
r9 0x80 128
r10 0x8 8
r11 0x202 514
r12 0x7ffff7ffdb5c 140737354128220
r13 0x1 1
r14 0x7fffffffd6d0 140737488344784
r15 0x7fffffffd6f0 140737488344816
rip 0x7ffff7fa9703 0x7ffff7fa9703 <kill+17>
eflags 0x202 [ IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
fs_base 0x7ffff7ffdb28 140737354128168
gs_base 0x0 0
Does this tell you anything?
> I'm not sure if the crashing code is running on the signal stack or
> main stack, but here's a thought: is it possible the CI machines are
> running on a cpu/kernel with some monster AVX512 or whatever
> extension
> enabled with register file that doesn't fit in MINSIGSTKSZ?
That might be the case. Would explain why I could not reproduce in my
9-year old laptop I was running last month, but I can reproduce it now
in a new machine with a 13th Gen Intel(R) Core(TM) i7-1360P
> If so,
> using sysconf(_SC_MINSIGSTKSZ) (conditional on _SC_MINSIGSTKSZ being
> defined) to allocate the alt stack should mitigate the problem. If
> doing this, it should probably be allocated by mmap or malloc, since
> in principle it could be too large for the caller's stack.
>
I'll forward this to the maintainers, let's see if we can come up with
a solution. Thanks a lot for your feedback!
> It's also possible that the kernel may have some weird behavior
> deciding if the task is already "running on the alt stack" when the
> alt stack is embedded in the normal stack like this. Just getting rid
> of that might be worth trying. If so, whether the problem manifests
> could be subject to timing of signal delivery (although I would not
> expect that for synchronously generated signals like here).
>
> Rich
next prev parent reply other threads:[~2024-05-30 10:18 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-29 12:04 Pablo Correa Gomez
2024-05-29 13:15 ` Rich Felker
2024-05-30 10:17 ` Pablo Correa Gomez [this message]
2024-05-30 11:51 ` Markus Wichmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3201c36ee287e6d38e0f3805440a507de8fb52bf.camel@postmarketos.org \
--to=pabloyoyoista@postmarketos.org \
--cc=dalias@libc.org \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).