Re: [musl] Crash in kill(..., SIGHUP) when using SA_ONSTACK

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Rich Felker <dalias@libc.org>
To: Pablo Correa Gomez <pabloyoyoista@postmarketos.org>
Cc: musl@lists.openwall.com
Subject: Re: [musl] Crash in kill(..., SIGHUP) when using SA_ONSTACK
Date: Wed, 29 May 2024 09:15:34 -0400	[thread overview]
Message-ID: <20240529131533.GH10433@brightrain.aerifal.cx> (raw)
In-Reply-To: <d8475b607b0c728b9133846c4faa469f9e4cad16.camel@postmarketos.org>

On Wed, May 29, 2024 at 02:04:25PM +0200, Pablo Correa Gomez wrote:
> Hi everybody,
> 
> I am responsible for musl CI in GNOME's GLib, and we have recently
> bumped into a crash that I have been unable to resolve. 
> 
> https://gitlab.gnome.org/GNOME/glib/-/commit/137db219a7266300ffde1aa75d781284fb0807cb
> introduced in GLib an alternate stack by setting the signal action
> SA_ONSTACK if available. However, the tests that were introduced, and
> that pass in most other libc's (there's CI for a lot more than just
> glibc and musl) crash in my alpine linux edge installation with SIGSEGV
> (stack trace below) while doing: kill (getpid(), SIGHUP)
> 
> I have verified that not adding SA_ONSTACK fixes the crash. Would
> anybody have some pointers of what could possibly be going wrong? If
> anybody is really interested, the public issue is
> https://gitlab.gnome.org/GNOME/glib/-/issues/3315
> 
> Stack trace
> ------------
> 
> Thread 1 "unix" received signal SIGSEGV, Segmentation fault.
> 0x00007ffff7fa96e8 in __syscall2 (a2=1, a1=17483, n=62) at
> ../arch/x86_64/syscall_arch.h:21
> warning: 21     ./arch/x86_64/syscall_arch.h: No such file or directory
> (gdb) bt
> #0  0x00007ffff7fa96e8 in __syscall2 (a2=1, a1=17483, n=62) at
> ../arch/x86_64/syscall_arch.h:21
> #1  kill (pid=17483, sig=sig@entry=1) at src/signal/kill.c:6
> #2  0x0000555555556e96 in test_signal (signum=signum@entry=1) at
> .../glib/tests/unix.c:534
> #3  0x0000555555557200 in test_signal_alternate_stack (signal=1) at
> .../glib/tests/unix.c:590
> #4  0x00007ffff7e8f364 in test_case_run (path=<optimized out>,
> test_run_name=0x55555555d3f0 "/glib-unix/sighup/alternate-stack",
> tc=0x55555555db60) at ../glib/gtestutils.c:2988
> #5  g_test_run_suite_internal (suite=suite@entry=0x55555555da70,
> path=path@entry=0x0) at ../glib/gtestutils.c:3090
> #6  0x00007ffff7e8f2db in g_test_run_suite_internal
> (suite=suite@entry=0x7ffff7ffee20, path=path@entry=0x0) at
> .../glib/gtestutils.c:3109
> #7  0x00007ffff7e8f2db in g_test_run_suite_internal
> (suite=suite@entry=0x7ffff7ffede0, path=path@entry=0x0) at
> .../glib/gtestutils.c:3109
> #8  0x00007ffff7e8f86a in g_test_run_suite
> (suite=suite@entry=0x7ffff7ffede0) at ../glib/gtestutils.c:3189
> #9  0x00007ffff7e8f8ea in g_test_run () at ../glib/gtestutils.c:2275
> #10 0x00005555555561f7 in main (argc=<optimized out>, argv=<optimized
> out>) at ../glib/tests/unix.c:910

Can you get a disassembly and register dump at the point of crash? My
best guess is that this is a simple stack overflow. There's really not
any other plausible reason for a segfault in kill(). The only
operations that touch memory in it are (on my build at least) a push
to realign the stack, and a call to __syscall_ret.

I'm not sure if the crashing code is running on the signal stack or
main stack, but here's a thought: is it possible the CI machines are
running on a cpu/kernel with some monster AVX512 or whatever extension
enabled with register file that doesn't fit in MINSIGSTKSZ? If so,
using sysconf(_SC_MINSIGSTKSZ) (conditional on _SC_MINSIGSTKSZ being
defined) to allocate the alt stack should mitigate the problem. If
doing this, it should probably be allocated by mmap or malloc, since
in principle it could be too large for the caller's stack.

It's also possible that the kernel may have some weird behavior
deciding if the task is already "running on the alt stack" when the
alt stack is embedded in the normal stack like this. Just getting rid
of that might be worth trying. If so, whether the problem manifests
could be subject to timing of signal delivery (although I would not
expect that for synchronously generated signals like here).

Rich

next prev parent reply	other threads:[~2024-05-29 13:15 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-29 12:04 Pablo Correa Gomez
2024-05-29 13:15 ` Rich Felker [this message]
2024-05-30 10:17   ` Pablo Correa Gomez
2024-05-30 11:51     ` Markus Wichmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240529131533.GH10433@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    --cc=pabloyoyoista@postmarketos.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).