mailing list of musl libc
 help / color / mirror / code / Atom feed
* musl pthread/tls issue.
@ 2014-10-22  6:33 黄建忠
  2014-10-22  7:08 ` Luca Barbato
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: 黄建忠 @ 2014-10-22  6:33 UTC (permalink / raw)
  To: musl, Rich Felker

Hi, Rich and all.

These days, I finished build a bootable x86_64 system(rpm based) include
musl/systemd/dracut/gcc-4.9.1/gcc-5/clang-3.5 and wayland/Xorg and the
whole GNOME-3.14 desktop(except webkit js segfault issue I mentioned
before) with a lot of patches(I will release all of them someday until
it reach a stable state.)

After a simple try, I found gnome-shell will segfault If I triggered the
app list(not always but often).

The dmesg report "pool [<some pid>] segfault xxxxxxxxxxx
libpixman-xxxxx", That's to say, it segfault in pixman library(A common
library used by Xorg and cairo),
gdb report it's a thread issue(a thread of gnome-shell) and segfault at
the beginning of general_composite_rect function in pixman-general.c,
the pointer of argument can not be accessed.

And after a quick look, the problem is in pixman-compiler.h, it defined
TLS related codes and macros according to specific
implementation(win32/mingw/pthread and so on).
By default, the TLS codes will be compiled, if it was disabled, there is
still a pthread fallback use 'pthread_key_create' and other pthread funcs.

Here is the link to it:
http://cgit.freedesktop.org/pixman/tree/pixman/pixman-compiler.h, please
look at the TLS section.

That's to say, there must be a problem exist in musl pthread/tls
implementation and can be triggered under certain circumstances. Please
help to solve it.

Related components:
Kernel: linux-3.17.0 without patch.
binutils: 2.24.90 without patch.
Compiler: gcc-5 and clang-3.5. with musl-enable patch.
Pixman: 0.32.6 git
Cairo: 1.14.0
Mesa: 10.3.1 stable
Xorg: 1.16.1 stable
gnome: 3.14 stable




-- 
Huang JianZhong



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: musl pthread/tls issue.
  2014-10-22  6:33 musl pthread/tls issue 黄建忠
@ 2014-10-22  7:08 ` Luca Barbato
  2014-10-22  7:17   ` 黄建忠
  2014-10-22  7:27 ` Jens Gustedt
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Luca Barbato @ 2014-10-22  7:08 UTC (permalink / raw)
  To: musl

On 22/10/14 08:33, 黄建忠 wrote:
> That's to say, there must be a problem exist in musl pthread/tls
> implementation and can be triggered under certain circumstances. Please
> help to solve it.

trying if the condition happens on glibc with the same codepath might help.

lu


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: musl pthread/tls issue.
  2014-10-22  7:08 ` Luca Barbato
@ 2014-10-22  7:17   ` 黄建忠
  0 siblings, 0 replies; 8+ messages in thread
From: 黄建忠 @ 2014-10-22  7:17 UTC (permalink / raw)
  To: musl

I had two bootable enviroment of the same source repos except the libc, 
It works well under glibc and as I said sometimes works sometimes not 
under Musl Libc, not always segfault but very easy to trigger.

于 2014年10月22日 15:08, Luca Barbato 写道:
> On 22/10/14 08:33, 黄建忠 wrote:
>> That's to say, there must be a problem exist in musl pthread/tls
>> implementation and can be triggered under certain circumstances. Please
>> help to solve it.
> trying if the condition happens on glibc with the same codepath might help.
>
> lu
>


-- 
Huang JianZhong





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: musl pthread/tls issue.
  2014-10-22  6:33 musl pthread/tls issue 黄建忠
  2014-10-22  7:08 ` Luca Barbato
@ 2014-10-22  7:27 ` Jens Gustedt
  2014-10-22  7:45 ` Szabolcs Nagy
  2014-10-22  7:58 ` Timo Teras
  3 siblings, 0 replies; 8+ messages in thread
From: Jens Gustedt @ 2014-10-22  7:27 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]

Am Mittwoch, den 22.10.2014, 14:33 +0800 schrieb 黄建忠:
> And after a quick look, the problem is in pixman-compiler.h, it defined
> TLS related codes and macros according to specific
> implementation(win32/mingw/pthread and so on).
> By default, the TLS codes will be compiled, if it was disabled, there is
> still a pthread fallback use 'pthread_key_create' and other pthread funcs.

Both, the TLS and pthread codes look fishy to me. They define the tls
"variables" as static which restricts any of them to be used from
inside the same TU. This perhaps may mostly be the case, but it is
certainly a desing restriction if not a flaw.

Maybe the code that errors for you does inline certain functions
according to compiler versions, flags etc, and thus creates several
copies of these static variables? Just a wild guess.

Jens


-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: musl pthread/tls issue.
  2014-10-22  6:33 musl pthread/tls issue 黄建忠
  2014-10-22  7:08 ` Luca Barbato
  2014-10-22  7:27 ` Jens Gustedt
@ 2014-10-22  7:45 ` Szabolcs Nagy
  2014-10-24  7:35   ` 黄建忠
  2014-10-22  7:58 ` Timo Teras
  3 siblings, 1 reply; 8+ messages in thread
From: Szabolcs Nagy @ 2014-10-22  7:45 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

* ?????? <jianzhong.huang@i-soft.com.cn> [2014-10-22 14:33:01 +0800]:
> These days, I finished build a bootable x86_64 system(rpm based) include
> musl/systemd/dracut/gcc-4.9.1/gcc-5/clang-3.5 and wayland/Xorg and the
> whole GNOME-3.14 desktop(except webkit js segfault issue I mentioned
> before) with a lot of patches(I will release all of them someday until
> it reach a stable state.)
> 
> After a simple try, I found gnome-shell will segfault If I triggered the
> app list(not always but often).
> 
> The dmesg report "pool [<some pid>] segfault xxxxxxxxxxx
> libpixman-xxxxx", That's to say, it segfault in pixman library(A common
> library used by Xorg and cairo),
> gdb report it's a thread issue(a thread of gnome-shell) and segfault at
> the beginning of general_composite_rect function in pixman-general.c,
> the pointer of argument can not be accessed.
> 

that's not enough info..

both the webkit js and this crash sounds like thread stack overflow

> That's to say, there must be a problem exist in musl pthread/tls
> implementation and can be triggered under certain circumstances. Please
> help to solve it.
> 

i don't believe that without evidence: general_composite_rect itself
allocates >24k on the stack, that is about a third of the musl default
stack size

you can verify it by checking the diff of the top and bottom of the stack
(gdb backtrace prints the stack pointer, if the diff is >56k when that
func was entered then this was the problem) or looking at /proc/pid/maps
and if the crash happened in a guard page after a thread stack

to fix: make the application create a larger thread stack eg 1M
(pthread_attr_setstacksize, but gnome* will use gthread most likely
which has different api)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: musl pthread/tls issue.
  2014-10-22  6:33 musl pthread/tls issue 黄建忠
                   ` (2 preceding siblings ...)
  2014-10-22  7:45 ` Szabolcs Nagy
@ 2014-10-22  7:58 ` Timo Teras
  3 siblings, 0 replies; 8+ messages in thread
From: Timo Teras @ 2014-10-22  7:58 UTC (permalink / raw)
  To: 黄建忠; +Cc: musl, Rich Felker

On Wed, 22 Oct 2014 14:33:01 +0800
黄建忠 <jianzhong.huang@i-soft.com.cn> wrote:

> Hi, Rich and all.
> 
> These days, I finished build a bootable x86_64 system(rpm based)
> include musl/systemd/dracut/gcc-4.9.1/gcc-5/clang-3.5 and
> wayland/Xorg and the whole GNOME-3.14 desktop(except webkit js
> segfault issue I mentioned before) with a lot of patches(I will
> release all of them someday until it reach a stable state.)
> 
> After a simple try, I found gnome-shell will segfault If I triggered
> the app list(not always but often).
> 
> The dmesg report "pool [<some pid>] segfault xxxxxxxxxxx
> libpixman-xxxxx", That's to say, it segfault in pixman library(A
> common library used by Xorg and cairo),
> gdb report it's a thread issue(a thread of gnome-shell) and segfault
> at the beginning of general_composite_rect function in
> pixman-general.c, the pointer of argument can not be accessed.
> 
> And after a quick look, the problem is in pixman-compiler.h, it
> defined TLS related codes and macros according to specific
> implementation(win32/mingw/pthread and so on).
> By default, the TLS codes will be compiled, if it was disabled, there
> is still a pthread fallback use 'pthread_key_create' and other
> pthread funcs.
> 
> Here is the link to it:
> http://cgit.freedesktop.org/pixman/tree/pixman/pixman-compiler.h,
> please look at the TLS section.
> 
> That's to say, there must be a problem exist in musl pthread/tls
> implementation and can be triggered under certain circumstances.
> Please help to solve it.
> 
> Related components:
> Kernel: linux-3.17.0 without patch.
> binutils: 2.24.90 without patch.
> Compiler: gcc-5 and clang-3.5. with musl-enable patch.
> Pixman: 0.32.6 git
> Cairo: 1.14.0
> Mesa: 10.3.1 stable
> Xorg: 1.16.1 stable
> gnome: 3.14 stable

Is it perhaps this:
https://bugs.freedesktop.org/show_bug.cgi?id=35268

Does preloading libGL.so help?

We do this as workaround for eg. firefox currently (in Alpine Linux):
http://git.alpinelinux.org/cgit/aports/commit/?id=d9cda70e2c149004f1e87edd1de8f6e332e76953

/Timo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: musl pthread/tls issue.
  2014-10-22  7:45 ` Szabolcs Nagy
@ 2014-10-24  7:35   ` 黄建忠
  2014-10-24 11:32     ` Szabolcs Nagy
  0 siblings, 1 reply; 8+ messages in thread
From: 黄建忠 @ 2014-10-24  7:35 UTC (permalink / raw)
  To: musl

Great clue, Thanks.

It's a stack overflow issue.

The default pthread stacksize is 81920, that's 80k.

I increase the stacksize to 8M and this bug disappear.

I had tried add locks, make local copies and even found it's a over flow 
issue, But so stupid to forget the thread stacksize issue(since it's 
sufficient defaultly under glibc.)

And about the webkit, the different codebase of webkitgtk had different 
behaviors:
2.4.x run but report a exception of RangeError.
2.6.x(they call it webkitgtk4) use the same codebase as ewebkit, 
directly segfault.

I guess it's related to the "fastmalloc" of JavaScriptCore.



On 10/22/14 15:45, Szabolcs Nagy wrote:
> * ?????? <jianzhong.huang@i-soft.com.cn> [2014-10-22 14:33:01 +0800]:
>> These days, I finished build a bootable x86_64 system(rpm based) include
>> musl/systemd/dracut/gcc-4.9.1/gcc-5/clang-3.5 and wayland/Xorg and the
>> whole GNOME-3.14 desktop(except webkit js segfault issue I mentioned
>> before) with a lot of patches(I will release all of them someday until
>> it reach a stable state.)
>>
>> After a simple try, I found gnome-shell will segfault If I triggered the
>> app list(not always but often).
>>
>> The dmesg report "pool [<some pid>] segfault xxxxxxxxxxx
>> libpixman-xxxxx", That's to say, it segfault in pixman library(A common
>> library used by Xorg and cairo),
>> gdb report it's a thread issue(a thread of gnome-shell) and segfault at
>> the beginning of general_composite_rect function in pixman-general.c,
>> the pointer of argument can not be accessed.
>>
> that's not enough info..
>
> both the webkit js and this crash sounds like thread stack overflow
>
>> That's to say, there must be a problem exist in musl pthread/tls
>> implementation and can be triggered under certain circumstances. Please
>> help to solve it.
>>
> i don't believe that without evidence: general_composite_rect itself
> allocates >24k on the stack, that is about a third of the musl default
> stack size
>
> you can verify it by checking the diff of the top and bottom of the stack
> (gdb backtrace prints the stack pointer, if the diff is >56k when that
> func was entered then this was the problem) or looking at /proc/pid/maps
> and if the crash happened in a guard page after a thread stack
>
> to fix: make the application create a larger thread stack eg 1M
> (pthread_attr_setstacksize, but gnome* will use gthread most likely
> which has different api)
>


-- 
Huang JianZhong



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: musl pthread/tls issue.
  2014-10-24  7:35   ` 黄建忠
@ 2014-10-24 11:32     ` Szabolcs Nagy
  0 siblings, 0 replies; 8+ messages in thread
From: Szabolcs Nagy @ 2014-10-24 11:32 UTC (permalink / raw)
  To: musl

* ????????? <jianzhong.huang@i-soft.com.cn> [2014-10-24 15:35:46 +0800]:
> The default pthread stacksize is 81920, that's 80k.
> 
> I increase the stacksize to 8M and this bug disappear.

it would be nice to know what causes the stack usage in gnome-shell
(dynamic stack allocation can be a vulnerability)

> And about the webkit, the different codebase of webkitgtk had different
> behaviors:
> 2.4.x run but report a exception of RangeError.
> 2.6.x(they call it webkitgtk4) use the same codebase as ewebkit, directly
> segfault.
> 
> I guess it's related to the "fastmalloc" of JavaScriptCore.

it seems to be a tcmalloc variant by default, which can go wrong
in many ways so that should be disabled: try -DUSE_SYSTEM_MALLOC


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-10-24 11:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-22  6:33 musl pthread/tls issue 黄建忠
2014-10-22  7:08 ` Luca Barbato
2014-10-22  7:17   ` 黄建忠
2014-10-22  7:27 ` Jens Gustedt
2014-10-22  7:45 ` Szabolcs Nagy
2014-10-24  7:35   ` 黄建忠
2014-10-24 11:32     ` Szabolcs Nagy
2014-10-22  7:58 ` Timo Teras

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).