From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <musl-return-20974-ml=inbox.vuxu.org@lists.openwall.com>
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.8 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED,
	HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL autolearn=ham
	autolearn_force=no version=3.4.4
Received: from second.openwall.net (second.openwall.net [193.110.157.125])
	by inbox.vuxu.org (Postfix) with SMTP id 76B852138B
	for <ml@inbox.vuxu.org>; Thu, 30 May 2024 12:18:17 +0200 (CEST)
Received: (qmail 1954 invoked by uid 550); 30 May 2024 10:18:12 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 1916 invoked from network); 30 May 2024 10:18:12 -0000
X-Envelope-To: dalias@libc.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=postmarketos.org;
	s=key1; t=1717064283;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=6QZSvMLQSOdlee9xQKYzS2xRKI3quVtM0t0LzwuEz7o=;
	b=XrXRnCw9JpcmMZXyDWsz9HCDM9xY/fexOoEevvlNQ7m1myUHAa2xd6qIjOhe1FjnQCAg/v
	sEBDRQtcvi8q5Y0/gw7lRkFKaNFrm84LO2R2N6sRQIGxtIBQWxQsU+ZBXrYS2FxAcEn0hz
	WW2G/+43s5JJBFZQBXS3Fien1SvxyITYp3KeXyMjHmKtWmyoDXOL0US8w0XZQd7XEjWwMX
	0V82Pjjk3i0GlEor37xXo4BbO8QLQDm360vz+eYaSv2q0gMh4B+EbP6sehDTdAIH2nVFmY
	8fRmiI6SJVeHkeNckvsFJQUyVcM79OOLdOz/u4HUoKoG3vaGZKJa5ctpmO4Hew==
X-Envelope-To: musl@lists.openwall.com
Message-ID: <3201c36ee287e6d38e0f3805440a507de8fb52bf.camel@postmarketos.org>
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: Pablo Correa Gomez <pabloyoyoista@postmarketos.org>
To: Rich Felker <dalias@libc.org>
Cc: musl@lists.openwall.com
Date: Thu, 30 May 2024 12:17:59 +0200
In-Reply-To: <20240529131533.GH10433@brightrain.aerifal.cx>
References: 
	<d8475b607b0c728b9133846c4faa469f9e4cad16.camel@postmarketos.org>
	 <20240529131533.GH10433@brightrain.aerifal.cx>
Organization: postmarketOS
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Migadu-Flow: FLOW_OUT
Subject: Re: [musl] Crash in kill(..., SIGHUP) when using SA_ONSTACK

Hi Rich, thanks a lot for your reply

El mie, 29-05-2024 a las 09:15 -0400, Rich Felker escribi=C3=B3:
> On Wed, May 29, 2024 at 02:04:25PM +0200, Pablo Correa Gomez wrote:
> > Hi everybody,
> >=20
> > I am responsible for musl CI in GNOME's GLib, and we have recently
> > bumped into a crash that I have been unable to resolve.=20
> >=20
> > https://gitlab.gnome.org/GNOME/glib/-
> > /commit/137db219a7266300ffde1aa75d781284fb0807cb
> > introduced in GLib an alternate stack by setting the signal action
> > SA_ONSTACK if available. However, the tests that were introduced,
> > and
> > that pass in most other libc's (there's CI for a lot more than just
> > glibc and musl) crash in my alpine linux edge installation with
> > SIGSEGV
> > (stack trace below) while doing: kill (getpid(), SIGHUP)
> >=20
> > I have verified that not adding SA_ONSTACK fixes the crash. Would
> > anybody have some pointers of what could possibly be going wrong?
> > If
> > anybody is really interested, the public issue is
> > https://gitlab.gnome.org/GNOME/glib/-/issues/3315
> >=20
> > Stack trace
> > ------------
> >=20
> > Thread 1 "unix" received signal SIGSEGV, Segmentation fault.
> > 0x00007ffff7fa96e8 in __syscall2 (a2=3D1, a1=3D17483, n=3D62) at
> > ../arch/x86_64/syscall_arch.h:21
> > warning: 21=C2=A0=C2=A0=C2=A0=C2=A0 ./arch/x86_64/syscall_arch.h: No su=
ch file or
> > directory
> > (gdb) bt
> > #0=C2=A0 0x00007ffff7fa96e8 in __syscall2 (a2=3D1, a1=3D17483, n=3D62) =
at
> > ../arch/x86_64/syscall_arch.h:21
> > #1=C2=A0 kill (pid=3D17483, sig=3Dsig@entry=3D1) at src/signal/kill.c:6
> > #2=C2=A0 0x0000555555556e96 in test_signal (signum=3Dsignum@entry=3D1) =
at
> > .../glib/tests/unix.c:534
> > #3=C2=A0 0x0000555555557200 in test_signal_alternate_stack (signal=3D1)=
 at
> > .../glib/tests/unix.c:590
> > #4=C2=A0 0x00007ffff7e8f364 in test_case_run (path=3D<optimized out>,
> > test_run_name=3D0x55555555d3f0 "/glib-unix/sighup/alternate-stack",
> > tc=3D0x55555555db60) at ../glib/gtestutils.c:2988
> > #5=C2=A0 g_test_run_suite_internal (suite=3Dsuite@entry=3D0x55555555da7=
0,
> > path=3Dpath@entry=3D0x0) at ../glib/gtestutils.c:3090
> > #6=C2=A0 0x00007ffff7e8f2db in g_test_run_suite_internal
> > (suite=3Dsuite@entry=3D0x7ffff7ffee20, path=3Dpath@entry=3D0x0) at
> > .../glib/gtestutils.c:3109
> > #7=C2=A0 0x00007ffff7e8f2db in g_test_run_suite_internal
> > (suite=3Dsuite@entry=3D0x7ffff7ffede0, path=3Dpath@entry=3D0x0) at
> > .../glib/gtestutils.c:3109
> > #8=C2=A0 0x00007ffff7e8f86a in g_test_run_suite
> > (suite=3Dsuite@entry=3D0x7ffff7ffede0) at ../glib/gtestutils.c:3189
> > #9=C2=A0 0x00007ffff7e8f8ea in g_test_run () at
> > ../glib/gtestutils.c:2275
> > #10 0x00005555555561f7 in main (argc=3D<optimized out>,
> > argv=3D<optimized
> > out>) at ../glib/tests/unix.c:910
>=20
> Can you get a disassembly and register dump at the point of crash?

(gdb) layout asm

 0x7ffff7fa96f9 <kill+7>     movslq %esi,%rsi                        =20
 0x7ffff7fa96fc <kill+10>    mov    $0x3e,%eax                       =20
 0x7ffff7fa9701 <kill+15>    syscall                                 =20
>0x7ffff7fa9703 <kill+17>    mov    %rax,%rdi                        =20
 0x7ffff7fa9706 <kill+20>    call  0x7ffff7f7afb7 <__syscall_ret>    =20
 0x7ffff7fa970b <kill+25>    add    $0x8,%rsp                        =20
 0x7ffff7fa970f <kill+29>    ret                                     =20
 0x7ffff7fa9710 <killpg>     test   %edi,%edi                        =20
 0x7ffff7fa9712 <killpg+2>   js     0x7ffff7fa971b <killpg+11>       =20
 0x7ffff7fa9714 <killpg+4>   neg    %edi                             =20
 0x7ffff7fa9716 <killpg+6>   jmp    0x7ffff7fa96f2 <kill>            =20
 0x7ffff7fa971b <killpg+11>  sub    $0x8,%rsp                        =20
 0x7ffff7fa971f <killpg+15>  call   0x7ffff7f78bae <__errno_location>=20
 0x7ffff7fa9724 <killpg+20>  movl   $0x16,(%rax)                     =20
 0x7ffff7fa972a <killpg+26>  mov    $0xffffffff,%eax                 =20
 0x7ffff7fa972f <killpg+31>  add    $0x8,%rsp                        =20
 0x7ffff7fa9733 <killpg+35>  ret                                     =20
 0x7ffff7fa9734 <psiginfo>   mov    (%rdi),%edi                      =20
 0x7ffff7fa9736 <psiginfo+2> jmp    0x7ffff7fa973b <psignal>         =20
 0x7ffff7fa973b <psignal>    push   %r15   =20
 0x7ffff7fa973d <psignal+2>  push   %r14                             =20
 0x7ffff7fa973f <psignal+4>  push   %r13                             =20
 0x7ffff7fa9741 <psignal+6>  lea 0x51938(%rip),%r13 # 0x7ffff7ffb080
<__stderr_FILE>
 0x7ffff7fa9748 <psignal+13> push   %r12    =20
 0x7ffff7fa974a <psignal+15> xor    %r12d,%r12d                      =20
 0x7ffff7fa974d <psignal+18> push   %rbp                             =20
 0x7ffff7fa974e <psignal+19> push   %rbx                             =20
 0x7ffff7fa974f <psignal+20> mov    %rsi,%rbx                        =20
 0x7ffff7fa9752 <psignal+23> sub    $0x18,%rsp                       =20
 0x7ffff7fa9756 <psignal+27> call   0x7ffff7fb5780 <strsignal>  =C2=A0

(gdb) info registers
rax            0x0                 0
rbx            0x7ffff7f55c30      140737353440304
rcx            0x7ffff7fa9703      140737353783043
rdx            0x0                 0
rsi            0x1                 1
rdi            0x525e              21086
rbp            0x1                 0x1
rsp            0x7fffffffd5d0      0x7fffffffd5d0
r8             0x0                 0
r9             0x80                128
r10            0x8                 8
r11            0x202               514
r12            0x7ffff7ffdb5c      140737354128220
r13            0x1                 1
r14            0x7fffffffd6d0      140737488344784
r15            0x7fffffffd6f0      140737488344816
rip            0x7ffff7fa9703      0x7ffff7fa9703 <kill+17>
eflags         0x202               [ IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
fs_base        0x7ffff7ffdb28      140737354128168
gs_base        0x0                 0

Does this tell you anything?
      =20
> I'm not sure if the crashing code is running on the signal stack or
> main stack, but here's a thought: is it possible the CI machines are
> running on a cpu/kernel with some monster AVX512 or whatever
> extension
> enabled with register file that doesn't fit in MINSIGSTKSZ?

That might be the case. Would explain why I could not reproduce in my
9-year old laptop I was running last month, but I can reproduce it now
in a new machine with a 13th Gen Intel(R) Core(TM) i7-1360P

>  If so,
> using sysconf(_SC_MINSIGSTKSZ) (conditional on _SC_MINSIGSTKSZ being
> defined) to allocate the alt stack should mitigate the problem. If
> doing this, it should probably be allocated by mmap or malloc, since
> in principle it could be too large for the caller's stack.
>=20

I'll forward this to the maintainers, let's see if we can come up with
a solution. Thanks a lot for your feedback!

> It's also possible that the kernel may have some weird behavior
> deciding if the task is already "running on the alt stack" when the
> alt stack is embedded in the normal stack like this. Just getting rid
> of that might be worth trying. If so, whether the problem manifests
> could be subject to timing of signal delivery (although I would not
> expect that for synchronously generated signals like here).
>=20
> Rich