mailing list of musl libc
 help / color / mirror / code / Atom feed
* SIGSEGV related to threads since 1.1.20?
@ 2018-11-10 23:31 Sebastian Kemper
  2018-11-10 23:35 ` Sebastian Kemper
  2018-11-10 23:42 ` Rich Felker
  0 siblings, 2 replies; 4+ messages in thread
From: Sebastian Kemper @ 2018-11-10 23:31 UTC (permalink / raw)
  To: musl

Hello all,

I've got an issue with mariadb segfaulting. And apparently it has to do
with the switch from musl 1.1.19 to 1.1.20.

First off, I'm not a programmer, so the info below might be warped a
bit.

I maintain the mariadb package on OpenWrt. There was a report on the
issues tracker about a segfault:
https://github.com/openwrt/packages/issues/7230

I installed a current openwrt snapshot today, then installed
mariadb-server. Afterwards I ran

mysql_install_db --force --basedir=/usr

to init the database. And then there was a segfault:

Sat Nov 10 23:41:08 2018 kern.info kernel: [17053.144829] do_page_fault(): sending SIGSEGV to mysqld for invalid write access to 00000000
Sat Nov 10 23:41:08 2018 kern.info kernel: [17053.144839] epc = 77fc2058 in libc.so[77f4a000+93000]
Sat Nov 10 23:41:08 2018 kern.info kernel: [17053.144863] ra  = 77fc1fa0 in libc.so[77f4a000+93000]

The messages look the same as in the report. Although the reporter used
a different way to get to this result (he attempted to connect to the
running server, whereas I tried to create a DB).

This is on an old dlink router (mips_24kc, ar71xx). The reporter used
something else (mips32r2, mir3g).

I went and compiled mariadb with debug symbols and installed the
unstripped binaries. Then I ran gdbserver on the mips device and
connected to it from my laptop. When I ran the commands in gdb I got
this output:

(gdb) c
Continuing.

Thread 2 "mysqld" received signal SIGSEGV, Segmentation fault.
__pthread_timedjoin_np (t=0x6bdced60, res=0x0, at=0x0) at src/thread/pthread_join.c:15
15                      if (state >= DT_DETACHED) a_crash();
(gdb) bt
#0  __pthread_timedjoin_np (t=0x6bdced60, res=0x0, at=0x0) at src/thread/pthread_join.c:15
#1  0x006bf754 in handle_bootstrap_impl (thd=<optimized out>) at /home/sk/tmp/openwrt/build_dir/target-mips_24kc_musl/mariadb-10.2.17/sql/sql_parse.cc:950
#2  0x006bfd58 in do_handle_bootstrap (thd=<optimized out>) at /home/sk/tmp/openwrt/build_dir/target-mips_24kc_musl/mariadb-10.2.17/sql/sql_parse.cc:1094
#3  0x006bfdfc in handle_bootstrap (arg=0x1dc7448) at /home/sk/tmp/openwrt/build_dir/target-mips_24kc_musl/mariadb-10.2.17/sql/sql_parse.cc:1077
#4  0x77fd10fc in start (p=0x77fd10fc <start+100>) at src/thread/pthread_create.c:147
#5  0x77f6702c in __clone () at src/thread/mips/clone.s:32
Backtrace stopped: frame did not save the PC

So apparently __pthread_timedjoin_np gets some NULL input and then the
program segfaults. I reran this with a breakpoint on the function and it
got called before the segfault and in these calls the args were not
NULL.

Anyway. I checked on openwrt's github what happened to musl in the past
months. And on Sep 21 musl was upgraded from 1.1.19 to 1.1.20. So I
reverted this commit and compiled 1.1.19. I then just downgraded musl on
the router (on-the-fly). That caused some programs like dropbear to stop
working properly due to missing symbols. OK, expected.

But when I ran 

mysql_install_db --force --basedir=/usr

it completed without errors. And once I upgraded to musl 1.1.20 I got
the segfault again.

I was hoping that maybe you could take a look at this :)

Kind regards,
Seb


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SIGSEGV related to threads since 1.1.20?
  2018-11-10 23:31 SIGSEGV related to threads since 1.1.20? Sebastian Kemper
@ 2018-11-10 23:35 ` Sebastian Kemper
  2018-11-10 23:42 ` Rich Felker
  1 sibling, 0 replies; 4+ messages in thread
From: Sebastian Kemper @ 2018-11-10 23:35 UTC (permalink / raw)
  To: musl

Oh, please keep me in the loop as I'm not subscribed!

Kind regards,
Seb


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SIGSEGV related to threads since 1.1.20?
  2018-11-10 23:31 SIGSEGV related to threads since 1.1.20? Sebastian Kemper
  2018-11-10 23:35 ` Sebastian Kemper
@ 2018-11-10 23:42 ` Rich Felker
  2018-11-10 23:46   ` Sebastian Kemper
  1 sibling, 1 reply; 4+ messages in thread
From: Rich Felker @ 2018-11-10 23:42 UTC (permalink / raw)
  To: Sebastian Kemper; +Cc: musl

On Sun, Nov 11, 2018 at 12:31:45AM +0100, Sebastian Kemper wrote:
> Hello all,
> 
> I've got an issue with mariadb segfaulting. And apparently it has to do
> with the switch from musl 1.1.19 to 1.1.20.
> 
> First off, I'm not a programmer, so the info below might be warped a
> bit.
> 
> I maintain the mariadb package on OpenWrt. There was a report on the
> issues tracker about a segfault:
> https://github.com/openwrt/packages/issues/7230
> 
> I installed a current openwrt snapshot today, then installed
> mariadb-server. Afterwards I ran
> 
> mysql_install_db --force --basedir=/usr
> 
> to init the database. And then there was a segfault:
> 
> Sat Nov 10 23:41:08 2018 kern.info kernel: [17053.144829] do_page_fault(): sending SIGSEGV to mysqld for invalid write access to 00000000
> Sat Nov 10 23:41:08 2018 kern.info kernel: [17053.144839] epc = 77fc2058 in libc.so[77f4a000+93000]
> Sat Nov 10 23:41:08 2018 kern.info kernel: [17053.144863] ra  = 77fc1fa0 in libc.so[77f4a000+93000]
> 
> The messages look the same as in the report. Although the reporter used
> a different way to get to this result (he attempted to connect to the
> running server, whereas I tried to create a DB).
> 
> This is on an old dlink router (mips_24kc, ar71xx). The reporter used
> something else (mips32r2, mir3g).
> 
> I went and compiled mariadb with debug symbols and installed the
> unstripped binaries. Then I ran gdbserver on the mips device and
> connected to it from my laptop. When I ran the commands in gdb I got
> this output:
> 
> (gdb) c
> Continuing.
> 
> Thread 2 "mysqld" received signal SIGSEGV, Segmentation fault.
> __pthread_timedjoin_np (t=0x6bdced60, res=0x0, at=0x0) at src/thread/pthread_join.c:15
> 15                      if (state >= DT_DETACHED) a_crash();
> (gdb) bt
> #0  __pthread_timedjoin_np (t=0x6bdced60, res=0x0, at=0x0) at src/thread/pthread_join.c:15
> #1  0x006bf754 in handle_bootstrap_impl (thd=<optimized out>) at /home/sk/tmp/openwrt/build_dir/target-mips_24kc_musl/mariadb-10.2.17/sql/sql_parse.cc:950
> #2  0x006bfd58 in do_handle_bootstrap (thd=<optimized out>) at /home/sk/tmp/openwrt/build_dir/target-mips_24kc_musl/mariadb-10.2.17/sql/sql_parse.cc:1094
> #3  0x006bfdfc in handle_bootstrap (arg=0x1dc7448) at /home/sk/tmp/openwrt/build_dir/target-mips_24kc_musl/mariadb-10.2.17/sql/sql_parse.cc:1077
> #4  0x77fd10fc in start (p=0x77fd10fc <start+100>) at src/thread/pthread_create.c:147
> #5  0x77f6702c in __clone () at src/thread/mips/clone.s:32
> Backtrace stopped: frame did not save the PC
> 
> So apparently __pthread_timedjoin_np gets some NULL input and then the
> program segfaults. I reran this with a breakpoint on the function and it
> got called before the segfault and in these calls the args were not
> NULL.

This it an intentional trap for undefined behavior when the caller
attempts to join a detached thread or detach a thread that was not
joinable (already detached or already being joined by another thread). 

In the case of mariadb, it was reported as:

https://jira.mariadb.org/browse/MDEV-17200

and the corresponding Alping Linux bug:

https://bugs.alpinelinux.org/issues/9407

The patch is available in Alpine Linux's aport repo:

https://git.alpinelinux.org/cgit/aports/tree/main/mariadb/fix-pthread-detach.patch

Rich


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SIGSEGV related to threads since 1.1.20?
  2018-11-10 23:42 ` Rich Felker
@ 2018-11-10 23:46   ` Sebastian Kemper
  0 siblings, 0 replies; 4+ messages in thread
From: Sebastian Kemper @ 2018-11-10 23:46 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Sat, Nov 10, 2018 at 06:42:59PM -0500, Rich Felker wrote:
> This it an intentional trap for undefined behavior when the caller
> attempts to join a detached thread or detach a thread that was not
> joinable (already detached or already being joined by another thread). 
> 
> In the case of mariadb, it was reported as:
> 
> https://jira.mariadb.org/browse/MDEV-17200
> 
> and the corresponding Alping Linux bug:
> 
> https://bugs.alpinelinux.org/issues/9407
> 
> The patch is available in Alpine Linux's aport repo:
> 
> https://git.alpinelinux.org/cgit/aports/tree/main/mariadb/fix-pthread-detach.patch
> 
> Rich

Hello Rich,

Thank you very much!!! I'll take a look at this :-)

Have a great weekend!

Kind regards,
Seb


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-11-10 23:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-10 23:31 SIGSEGV related to threads since 1.1.20? Sebastian Kemper
2018-11-10 23:35 ` Sebastian Kemper
2018-11-10 23:42 ` Rich Felker
2018-11-10 23:46   ` Sebastian Kemper

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).