mailing list of musl libc
 help / color / mirror / code / Atom feed
* TLS (thread-local storage) support
@ 2012-10-04 21:13 Rich Felker
  2012-10-04 21:29 ` Daniel Cegiełka
  2012-10-05  3:04 ` Rich Felker
  0 siblings, 2 replies; 26+ messages in thread
From: Rich Felker @ 2012-10-04 21:13 UTC (permalink / raw)
  To: musl

Hi,

I've committed the initial version of thread-local storage
(__thread/_Thread_local keyword). So far, it only works in
static-linked applications, and might or might not be working properly
on arm, mips, and microblaze. The latter is a matter of whether these
archs need "TLS variant I" instead of the much cleaner/saner "variant
II" used by i386 and x86_64; unfortunately, Drepper's paper on TLS ABI
omits most of the interesting archs in favor of dying or dead ones
like Itanium, so I'm going to have to dig into other sources to find
out if musl needs to special-case any or all of these.

I also have the design for dynamic-linked TLS mostly worked out, but
need to make some changes to the dynamic linker to get it integrated.
Should be coming soon.

Reports of success or problems encountered are welcome, especially on
non-x86 archs, would be interesting/welcome.

Note that if you've been building gcc with --disable-tls, __thread was
already working but gets emulated (very poorly; it's slow and will
abort() if it runs out of memory) through libgcc. Such compilers are
useless for testing the new real TLS support, so rebuild without
--disable-tls if needed before testing.

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-04 21:13 TLS (thread-local storage) support Rich Felker
@ 2012-10-04 21:29 ` Daniel Cegiełka
  2012-10-04 22:36   ` Rich Felker
  2012-10-05  3:04 ` Rich Felker
  1 sibling, 1 reply; 26+ messages in thread
From: Daniel Cegiełka @ 2012-10-04 21:29 UTC (permalink / raw)
  To: musl

great news! Finally able to compile Go (lang)...

thx,
Daniel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-04 21:29 ` Daniel Cegiełka
@ 2012-10-04 22:36   ` Rich Felker
  2012-10-06  8:17     ` Daniel Cegiełka
  0 siblings, 1 reply; 26+ messages in thread
From: Rich Felker @ 2012-10-04 22:36 UTC (permalink / raw)
  To: musl

On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote:
> great news! Finally able to compile Go (lang)...

Did Go fail with gcc's emulated TLS in libgcc? My impression is that
it should usually/always work, but it's just very slow and
low-quality (lazy allocation). This isn't gcc's fault, just the fact
that it's impossible to emulate correctly. On the other hand, Go might
be generating code that accesses TLS directly, in which case the
emulation may not suffice.

BTW, does Go work with static linking? If not, you might need to wait
to celebrate until I add the dynamic-linked TLS support...

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-04 21:13 TLS (thread-local storage) support Rich Felker
  2012-10-04 21:29 ` Daniel Cegiełka
@ 2012-10-05  3:04 ` Rich Felker
  2012-10-05 17:27   ` Rich Felker
  1 sibling, 1 reply; 26+ messages in thread
From: Rich Felker @ 2012-10-05  3:04 UTC (permalink / raw)
  To: musl

On Thu, Oct 04, 2012 at 05:13:32PM -0400, Rich Felker wrote:
> Hi,
> 
> I've committed the initial version of thread-local storage
> (__thread/_Thread_local keyword). So far, it only works in
> static-linked applications,

Scratch that. It's now supported everywhere except dynamically loaded
(dlopen'd) shared libraries. And I'm working on adding support for
them too. So far only i386 is tested, but at least x86_64 is also very
likely to work (it's basically the same).

> and might or might not be working properly
> on arm, mips, and microblaze.

I believe it's working on ARM, but it's completely untested.
Microblaze and MIPS do not yet have the necessary relocation
processing, but TLS in the main executable (static or dynamic linked)
_might_ work.

> The latter is a matter of whether these
> archs need "TLS variant I" instead of the much cleaner/saner "variant
> II" used by i386 and x86_64;

So far, I can't see anywhere the variant is relevant to the ABI; it
seems we can just use "variant II" unconditionally. Let's hope I'm
right because I don't feel like dealing with more ugly, gratuitous
special-case code.

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-05  3:04 ` Rich Felker
@ 2012-10-05 17:27   ` Rich Felker
  2012-10-06 14:33     ` Szabolcs Nagy
  0 siblings, 1 reply; 26+ messages in thread
From: Rich Felker @ 2012-10-05 17:27 UTC (permalink / raw)
  To: musl

On Thu, Oct 04, 2012 at 11:04:14PM -0400, Rich Felker wrote:
> On Thu, Oct 04, 2012 at 05:13:32PM -0400, Rich Felker wrote:
> > Hi,
> > 
> > I've committed the initial version of thread-local storage
> > (__thread/_Thread_local keyword). So far, it only works in
> > static-linked applications,
> 
> Scratch that. It's now supported everywhere except dynamically loaded
> (dlopen'd) shared libraries. And I'm working on adding support for

And they're working now too.

I've also made some general fixes and improvements to the dynamic
linker -- minor corrections in how library files are located, and
support for recursive calls to dlopen (happens when a library has
constructors and one of those constructors calls dlopen). This same
change was also necessary to avoid blocking pthread_create calls for
the entire duration of constructor execution.

Some further dynamic linker development directions:

- Unifying the relocation code in arch/$(ARCH)/reloc.h to minimize
  duplication.

- Adding dlsym() support for TLS vars (obtaining current thread's
  copy).

- Cleanup and reduction of code duplication - phdr parsing and symbol
  lookup logic is duplicated in several places.

And of course testing TLS on other archs and fixing anything that's
broken...


Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-04 22:36   ` Rich Felker
@ 2012-10-06  8:17     ` Daniel Cegiełka
  2012-10-16 21:27       ` boris brezillon
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Cegiełka @ 2012-10-06  8:17 UTC (permalink / raw)
  To: musl

2012/10/5 Rich Felker <dalias@aerifal.cx>:
> On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote:
>> great news! Finally able to compile Go (lang)...
>
> Did Go fail with gcc's emulated TLS in libgcc?

I tested Go with sabotage (with fresh musl). I'll try to do it again...
gcc in sabotage was compiled without support for TLS, so I didn't
expect that it will be successful:

https://github.com/rofl0r/sabotage/blob/master/pkg/gcc4

> My impression is that
> it should usually/always work, but it's just very slow and
> low-quality (lazy allocation). This isn't gcc's fault, just the fact
> that it's impossible to emulate correctly. On the other hand, Go might
> be generating code that accesses TLS directly, in which case the
> emulation may not suffice.
>
> BTW, does Go work with static linking? If not, you might need to wait
> to celebrate until I add the dynamic-linked TLS support...

https://groups.google.com/forum/?fromgroups=#!topic/golang-nuts/N5QCFkXon0M

Daniel

> Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-05 17:27   ` Rich Felker
@ 2012-10-06 14:33     ` Szabolcs Nagy
  2012-10-06 20:39       ` Szabolcs Nagy
  0 siblings, 1 reply; 26+ messages in thread
From: Szabolcs Nagy @ 2012-10-06 14:33 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 522 bytes --]

* Rich Felker <dalias@aerifal.cx> [2012-10-05 13:27:29 -0400]:
> On Thu, Oct 04, 2012 at 11:04:14PM -0400, Rich Felker wrote:
> > Scratch that. It's now supported everywhere except dynamically loaded
> > (dlopen'd) shared libraries. And I'm working on adding support for
> 
> And they're working now too.
> 

should the attached code work with dlopen
when compiled as a dso?

(i wanted to check if the alignments are ok after a dlopen,
but i can see how this usage may not be supported)

it seems it dies here in the ctor

[-- Attachment #2: tls.c --]
[-- Type: text/x-csrc, Size: 574 bytes --]

#include <stddef.h>

__thread char      c1 = 1;
__thread char      xchar = 2;
__thread char      c2 = 3;
__thread short     xshort = 4;
__thread char      c3 = 5;
__thread int       xint = 6;
__thread char      c4 = 7;
__thread long long xllong = 8;

struct {
	char *name;
	size_t size;
	size_t align;
	size_t addr;
} t[4];

#define entry(i,x) \
	t[i].name = #x; \
	t[i].size = sizeof x; \
	t[i].align = __alignof__(x); \
	t[i].addr = (size_t)&x;

__attribute__((constructor)) static void init(void)
{
	entry(0, xchar)
	entry(1, xshort)
	entry(2, xint)
	entry(3, xllong)
}


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-06 14:33     ` Szabolcs Nagy
@ 2012-10-06 20:39       ` Szabolcs Nagy
  2012-10-06 20:58         ` Rich Felker
  0 siblings, 1 reply; 26+ messages in thread
From: Szabolcs Nagy @ 2012-10-06 20:39 UTC (permalink / raw)
  To: musl

* Szabolcs Nagy <nsz@port70.net> [2012-10-06 16:33:01 +0200]:
> should the attached code work with dlopen
> when compiled as a dso?
> 
> (i wanted to check if the alignments are ok after a dlopen,
> but i can see how this usage may not be supported)
> 
> it seems it dies here in the ctor

a more minimal example:

a.c:
__thread int xx;
int *p;
__attribute__((constructor)) static void init(void)
{
        p = &xx;
}

b.c:
#include <dlfcn.h>
void *h;
int main()
{
        h = dlopen("./a.so", RTLD_LAZY);
}

compiled as
musl-gcc -shared -fPIC -g -o a.so a.c
musl-gcc -g -o b b.c

./b segfaults in init at p=&xx


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-06 20:39       ` Szabolcs Nagy
@ 2012-10-06 20:58         ` Rich Felker
  0 siblings, 0 replies; 26+ messages in thread
From: Rich Felker @ 2012-10-06 20:58 UTC (permalink / raw)
  To: musl

On Sat, Oct 06, 2012 at 10:39:39PM +0200, Szabolcs Nagy wrote:
> * Szabolcs Nagy <nsz@port70.net> [2012-10-06 16:33:01 +0200]:
> > should the attached code work with dlopen
> > when compiled as a dso?
> > 
> > (i wanted to check if the alignments are ok after a dlopen,
> > but i can see how this usage may not be supported)
> > 
> > it seems it dies here in the ctor
> 
> a more minimal example:
> 
> a.c:
> __thread int xx;
> int *p;
> __attribute__((constructor)) static void init(void)
> {
>         p = &xx;
> }
> 
> b.c:
> #include <dlfcn.h>
> void *h;
> int main()
> {
>         h = dlopen("./a.so", RTLD_LAZY);
> }
> 
> compiled as
> musl-gcc -shared -fPIC -g -o a.so a.c
> musl-gcc -g -o b b.c
> 
> ../b segfaults in init at p=&xx

Very stupid issue, fixed by commit
92e1cd9b0ba9a8fa86e0346b121e159fb88f99bc:

http://git.musl-libc.org/cgit/musl/commit/?id=92e1cd9b0ba9a8fa86e0346b121e159fb88f99bc

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-06  8:17     ` Daniel Cegiełka
@ 2012-10-16 21:27       ` boris brezillon
  2012-10-16 21:47         ` boris brezillon
  0 siblings, 1 reply; 26+ messages in thread
From: boris brezillon @ 2012-10-16 21:27 UTC (permalink / raw)
  To: musl

Hi,

First I'd like to thank Rich for adding TLS support (I started to work
on it a few weeks ago but never had time to finish it).

2012/10/6 Daniel Cegiełka <daniel.cegielka@gmail.com>:
> 2012/10/5 Rich Felker <dalias@aerifal.cx>:
>> On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote:
>>> great news! Finally able to compile Go (lang)...
>>
>> Did Go fail with gcc's emulated TLS in libgcc?
>
> I tested Go with sabotage (with fresh musl). I'll try to do it again...
> gcc in sabotage was compiled without support for TLS, so I didn't
> expect that it will be successful:
>
> https://github.com/rofl0r/sabotage/blob/master/pkg/gcc4
>
There's at least one thing (maybe more) missing for go support with
musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849 and
http://gcc.gnu.org/wiki/SplitStacks).

I'm also interested in split stack support in musl but for other
reasons (thread and coroutine stack automatic expansion).

For x86/x86_64 split stack is implemented using a field inside the
pthread struct which is accessed via %fs (or %gs for x86_64) and an
offset.

Currently this offset is defined at 0x30 (0x70 for x86_64) by the
TARGET_THREAD_SPLIT_STACK_OFFSET but only if TARGET_LIBC_PROVIDES_SSP
is defined (see gcc/config/i386/gnu-user.h or
gcc/config/i386/gnu-user64.h).

As far as I know musl does not support stack protection, but we could
at least patch gcc to define TARGET_THREAD_SPLIT_STACK_OFFSET when
using musl.

We also need to reserve a field in the musl pthread struct. There are
currently two fields named 'unused1' and 'unused2' but I'm not sure
they're really unused in every supported arch.


BTW, I'd like to work on a more integrated support of split stack in MUSL :

1) support in dynamic linker (see the last point of
http://gcc.gnu.org/wiki/SplitStacks) : check split stack notes in
shared libs (and program ?)

2) support in thread implementation : currently when a thread is
created the stack limit is set afterward (see
https://github.com/mirrors/gcc/blob/master/libgcc/generic-morestack-thread.c
and https://github.com/mirrors/gcc/blob/master/libgcc/config/i386/morestack.S)
and the stack size is supposed to be 16K (which is the minimum stack
size). This means we may reallocate a new stack chunk even if the
previous one (the first one) is not fully used.
If stack limit is set by thread implementation, this can be set
appropriately according to the stack size defined by the thread
creator.

3) more optimizations I haven't thought about yet...

Do you have any concern about adding those features in musl ?

Let me know if you see other issues I haven't noticed.


Regards,

Boris


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-16 21:27       ` boris brezillon
@ 2012-10-16 21:47         ` boris brezillon
  2012-10-16 22:09           ` Szabolcs Nagy
                             ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: boris brezillon @ 2012-10-16 21:47 UTC (permalink / raw)
  To: musl

2012/10/16 boris brezillon <b.brezillon.musl@gmail.com>:
> Hi,
>
> First I'd like to thank Rich for adding TLS support (I started to work
> on it a few weeks ago but never had time to finish it).
>
> 2012/10/6 Daniel Cegiełka <daniel.cegielka@gmail.com>:
>> 2012/10/5 Rich Felker <dalias@aerifal.cx>:
>>> On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote:
>>>> great news! Finally able to compile Go (lang)...
>>>
>>> Did Go fail with gcc's emulated TLS in libgcc?
>>
>> I tested Go with sabotage (with fresh musl). I'll try to do it again...
>> gcc in sabotage was compiled without support for TLS, so I didn't
>> expect that it will be successful:
>>
>> https://github.com/rofl0r/sabotage/blob/master/pkg/gcc4
>>
> There's at least one thing (maybe more) missing for go support with
> musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849 and
> http://gcc.gnu.org/wiki/SplitStacks).
>
> I'm also interested in split stack support in musl but for other
> reasons (thread and coroutine stack automatic expansion).
>
> For x86/x86_64 split stack is implemented using a field inside the
> pthread struct which is accessed via %fs (or %gs for x86_64) and an
> offset.
>
> Currently this offset is defined at 0x30 (0x70 for x86_64) by the
> TARGET_THREAD_SPLIT_STACK_OFFSET but only if TARGET_LIBC_PROVIDES_SSP
> is defined (see gcc/config/i386/gnu-user.h or
> gcc/config/i386/gnu-user64.h).
>
> As far as I know musl does not support stack protection, but we could
> at least patch gcc to define TARGET_THREAD_SPLIT_STACK_OFFSET when
> using musl.
>
> We also need to reserve a field in the musl pthread struct. There are
> currently two fields named 'unused1' and 'unused2' but I'm not sure
> they're really unused in every supported arch.
>
>
> BTW, I'd like to work on a more integrated support of split stack in MUSL :
>
> 1) support in dynamic linker (see the last point of
> http://gcc.gnu.org/wiki/SplitStacks) : check split stack notes in
> shared libs (and program ?)
>
> 2) support in thread implementation : currently when a thread is
> created the stack limit is set afterward (see
> https://github.com/mirrors/gcc/blob/master/libgcc/generic-morestack-thread.c
> and https://github.com/mirrors/gcc/blob/master/libgcc/config/i386/morestack.S)
> and the stack size is supposed to be 16K (which is the minimum stack
> size). This means we may reallocate a new stack chunk even if the
> previous one (the first one) is not fully used.
> If stack limit is set by thread implementation, this can be set
> appropriately according to the stack size defined by the thread
> creator.
>
> 3) more optimizations I haven't thought about yet...
>
4) Compile musl with '-fsplit-stack' and add no_split_stack attribute
to appropriate functions (at least all functions called before
pthread_self_init because %gs or %fs register is unusable before this
call).

5) set main thread stack limit to 0 (pthread_self_init) : the main
thread stack grow is handled by the kernel.

6) add no-split-stack note to every asm file.

7) make split stack support optional (either by checking the
-fsplit-stack option in CFLAGS or with a specific option :
--enable-split-stack) : split stack adds overhead to every functions
(except for those with the 'no_split_stack' attribute).

> Do you have any concern about adding those features in musl ?
>
> Let me know if you see other issues I haven't noticed.
>
>
> Regards,
>
> Boris


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-16 21:47         ` boris brezillon
@ 2012-10-16 22:09           ` Szabolcs Nagy
  2012-10-16 23:16             ` boris brezillon
  2012-10-16 23:29             ` Rich Felker
  2012-10-16 22:54           ` Rich Felker
  2012-10-19 18:39           ` orc
  2 siblings, 2 replies; 26+ messages in thread
From: Szabolcs Nagy @ 2012-10-16 22:09 UTC (permalink / raw)
  To: musl

* boris brezillon <b.brezillon.musl@gmail.com> [2012-10-16 23:47:52 +0200]:
> > There's at least one thing (maybe more) missing for go support with
> > musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849 and
> > http://gcc.gnu.org/wiki/SplitStacks).
> >

why does go need support from libc?

it has its own runtime and libraries on raw syscalls

> 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute
> to appropriate functions (at least all functions called before
> pthread_self_init because %gs or %fs register is unusable before this
> call).
> 

what does a no_split_stack function do when it runs out of stack?

most functions in musl may be run before pthread_self_init
(it runs on demand when a pthread function is used)

what's the use of split stack if some functions may not work with it?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-16 21:47         ` boris brezillon
  2012-10-16 22:09           ` Szabolcs Nagy
@ 2012-10-16 22:54           ` Rich Felker
  2012-10-16 23:39             ` boris brezillon
  2012-10-19 18:39           ` orc
  2 siblings, 1 reply; 26+ messages in thread
From: Rich Felker @ 2012-10-16 22:54 UTC (permalink / raw)
  To: musl

On Tue, Oct 16, 2012 at 11:47:52PM +0200, boris brezillon wrote:
> 2012/10/16 boris brezillon <b.brezillon.musl@gmail.com>:
> > Hi,
> >
> > First I'd like to thank Rich for adding TLS support (I started to work
> > on it a few weeks ago but never had time to finish it).
> >
> > 2012/10/6 Daniel Cegiełka <daniel.cegielka@gmail.com>:
> >> 2012/10/5 Rich Felker <dalias@aerifal.cx>:
> >>> On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote:
> >>>> great news! Finally able to compile Go (lang)...
> >>>
> >>> Did Go fail with gcc's emulated TLS in libgcc?
> >>
> >> I tested Go with sabotage (with fresh musl). I'll try to do it again...
> >> gcc in sabotage was compiled without support for TLS, so I didn't
> >> expect that it will be successful:
> >>
> >> https://github.com/rofl0r/sabotage/blob/master/pkg/gcc4
> >>
> > There's at least one thing (maybe more) missing for go support with
> > musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849 and
> > http://gcc.gnu.org/wiki/SplitStacks).
> >
> > I'm also interested in split stack support in musl but for other
> > reasons (thread and coroutine stack automatic expansion).
> >
> > For x86/x86_64 split stack is implemented using a field inside the
> > pthread struct which is accessed via %fs (or %gs for x86_64) and an
> > offset.
> >
> > Currently this offset is defined at 0x30 (0x70 for x86_64) by the
> > TARGET_THREAD_SPLIT_STACK_OFFSET but only if TARGET_LIBC_PROVIDES_SSP
> > is defined (see gcc/config/i386/gnu-user.h or
> > gcc/config/i386/gnu-user64.h).
> >
> > As far as I know musl does not support stack protection, but we could
> > at least patch gcc to define TARGET_THREAD_SPLIT_STACK_OFFSET when
> > using musl.
> >
> > We also need to reserve a field in the musl pthread struct. There are
> > currently two fields named 'unused1' and 'unused2' but I'm not sure
> > they're really unused in every supported arch.
> >
> >
> > BTW, I'd like to work on a more integrated support of split stack in MUSL :

I'm not a fan of split-stack for various reasons, but I have no
objection to adding support to make it work as long as it's an
optional feature that does not impair non-split-stack usage.

> > 1) support in dynamic linker (see the last point of
> > http://gcc.gnu.org/wiki/SplitStacks) : check split stack notes in
> > shared libs (and program ?)

It could be done, but is it really useful? There are infinitely many
ways you can crash a program with libraries that were not built
correctly for use with it. Checking for one of them seems like
gratuitous complexity with little benefit.

> > 2) support in thread implementation : currently when a thread is
> > created the stack limit is set afterward (see
> > https://github.com/mirrors/gcc/blob/master/libgcc/generic-morestack-thread.c
> > and https://github.com/mirrors/gcc/blob/master/libgcc/config/i386/morestack.S)
> > and the stack size is supposed to be 16K (which is the minimum stack
> > size). This means we may reallocate a new stack chunk even if the
> > previous one (the first one) is not fully used.
> > If stack limit is set by thread implementation, this can be set
> > appropriately according to the stack size defined by the thread
> > creator.

That's perfectly reasonable to support.

> > 3) more optimizations I haven't thought about yet...
> >
> 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute
> to appropriate functions (at least all functions called before
> pthread_self_init because %gs or %fs register is unusable before this
> call).

This is definitely not desirable, at least not by default. It hurts
performance, possibly a lot, and destroys async-signal-safety. Also I
doubt it's needed. As long as split stack mode leaves at least ~8k
when calling a new function, most if not all functions in musl should
run fine without needing support for enlarging the stack.

> 5) set main thread stack limit to 0 (pthread_self_init) : the main
> thread stack grow is handled by the kernel.
> 
> 6) add no-split-stack note to every asm file.

I'm against this, or any boilerplate clutter. If it's really needed,
it should be possible with CFLAGS (or "ASFLAGS"), rather than
modifying every file, and if there's no way to do it with command line
options, that's a bug in gas.

With that said, why would it be needed? I don't think there are any
asm files that use more than 32 bytes of stack...

> 7) make split stack support optional (either by checking the
> -fsplit-stack option in CFLAGS or with a specific option :
> --enable-split-stack) : split stack adds overhead to every functions
> (except for those with the 'no_split_stack' attribute).
> 
> > Do you have any concern about adding those features in musl ?

Basically, the whole idea of split-stack is antithetical to the QoI
guarantees of musl. A program using split-stack can crash at any time
due to out-of-memory, and there is no reliable/portable way to recover
from this condition. It's much like the following low-quality aspects
of glibc and default Linux config:

- overcommit
- lazy allocation of libc-internal storage
- lazy/on-demand allocation of TLS
- dynamic loading of libgcc_s.so at runtime in pthread_cancel
- etc.

On 64-bit machines, split-stack is 100% useless. You can get the same
behavior (crashing on OOM, but not having to know your stack size
ahead of time) by just turning on overcommit and using huge thread
stack sizes; the enormous 64-bit virtual address space makes it so you
don't have to worry about running out of virtual memory.

On 32-bit machines where virtual addresses are a precious resource,
split-stack is a clever hack that essentially allows you to
over-commit not just physical memory but virtual memory too. But it's
inherently non-robust, and even worse than physical memory overcommit.
At least in the latter case, the kernel can be intelligent about
choosing an "abusive" process to kill. But if you run out of virtual
memory, nothing can be done but terminating the whole process (you
can't just terminate a single thread because it will leave resources
in an inconsistent state).

As such, I'm willing to add whatever inexpensive support framework is
needed so that people who want to use split-stack can use it, but I'm
very wary of invasive or costly changes to support a feature which I
believe is fundamentally misguided (and, for 64-bit targets, utterly
useless).

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-16 22:09           ` Szabolcs Nagy
@ 2012-10-16 23:16             ` boris brezillon
  2012-10-17 10:37               ` Szabolcs Nagy
  2012-10-16 23:29             ` Rich Felker
  1 sibling, 1 reply; 26+ messages in thread
From: boris brezillon @ 2012-10-16 23:16 UTC (permalink / raw)
  To: musl

2012/10/17 Szabolcs Nagy <nsz@port70.net>:
> * boris brezillon <b.brezillon.musl@gmail.com> [2012-10-16 23:47:52 +0200]:
>> > There's at least one thing (maybe more) missing for go support with
>> > musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849 and
>> > http://gcc.gnu.org/wiki/SplitStacks).
>> >
>
> why does go need support from libc?

You're right:
1) I was talking about gccgo but I realized there's another compiler
(gc go) which does not rely on gcc at all.
2) split stack is not mandatory for gccgo (see libgo/configure.ac in
gcc sources)

But it's still possible to enable split-stack and in this case go
runtime relies on some libc functions (see libgcc/generic-morestack*).

>
> it has its own runtime and libraries on raw syscalls
>
>> 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute
>> to appropriate functions (at least all functions called before
>> pthread_self_init because %gs or %fs register is unusable before this
>> call).
>>
>
> what does a no_split_stack function do when it runs out of stack?

Segfault.

no_split_stack attribute is used for leaf functions or functions call
tree where the maximum stack size never exceed the reserved space for
extra stack chunk allocation (I don't remember the exact value).
>
> most functions in musl may be run before pthread_self_init
> (it runs on demand when a pthread function is used)
This can be done during dynamic linking process (by checking the split
stack note).
>
> what's the use of split stack if some functions may not work with it?
Only the explicitly specified functions (no_split_stack attribute)
won't include the split stack prolog.
This is the developer's responsability to carefully choose which one
to tag as 'no_split_stack'.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-16 22:09           ` Szabolcs Nagy
  2012-10-16 23:16             ` boris brezillon
@ 2012-10-16 23:29             ` Rich Felker
  1 sibling, 0 replies; 26+ messages in thread
From: Rich Felker @ 2012-10-16 23:29 UTC (permalink / raw)
  To: musl

On Wed, Oct 17, 2012 at 12:09:22AM +0200, Szabolcs Nagy wrote:
> most functions in musl may be run before pthread_self_init
> (it runs on demand when a pthread function is used)

This is tangential, but I've been considering changing that for a long
time. My thought is to have startup code always attempt to setup the
thread pointer (except in static binaries where it's statically
determined that nothing will use it). If it failed with ENOSYS
(missing syscall due to old kernel), musl would save a flag indicating
such and have minimal support code to prevent crashing when using
"plain libc" functions that have nothing to do with threads, so that
old/simple software can run even on Linux 2.4. If it failed with any
other reason (shouldn't be able to happen, but Linux is always
introducing stupid resource-exhaustion reasons things can fail...)
a_crash would be called before execution passes to the application
code.

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-16 22:54           ` Rich Felker
@ 2012-10-16 23:39             ` boris brezillon
  2012-10-16 23:48               ` Rich Felker
  0 siblings, 1 reply; 26+ messages in thread
From: boris brezillon @ 2012-10-16 23:39 UTC (permalink / raw)
  To: musl

2012/10/17 Rich Felker <dalias@aerifal.cx>:
> On Tue, Oct 16, 2012 at 11:47:52PM +0200, boris brezillon wrote:
>> 2012/10/16 boris brezillon <b.brezillon.musl@gmail.com>:
>> > Hi,
>> >
>> > First I'd like to thank Rich for adding TLS support (I started to work
>> > on it a few weeks ago but never had time to finish it).
>> >
>> > 2012/10/6 Daniel Cegiełka <daniel.cegielka@gmail.com>:
>> >> 2012/10/5 Rich Felker <dalias@aerifal.cx>:
>> >>> On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote:
>> >>>> great news! Finally able to compile Go (lang)...
>> >>>
>> >>> Did Go fail with gcc's emulated TLS in libgcc?
>> >>
>> >> I tested Go with sabotage (with fresh musl). I'll try to do it again...
>> >> gcc in sabotage was compiled without support for TLS, so I didn't
>> >> expect that it will be successful:
>> >>
>> >> https://github.com/rofl0r/sabotage/blob/master/pkg/gcc4
>> >>
>> > There's at least one thing (maybe more) missing for go support with
>> > musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849 and
>> > http://gcc.gnu.org/wiki/SplitStacks).
>> >
>> > I'm also interested in split stack support in musl but for other
>> > reasons (thread and coroutine stack automatic expansion).
>> >
>> > For x86/x86_64 split stack is implemented using a field inside the
>> > pthread struct which is accessed via %fs (or %gs for x86_64) and an
>> > offset.
>> >
>> > Currently this offset is defined at 0x30 (0x70 for x86_64) by the
>> > TARGET_THREAD_SPLIT_STACK_OFFSET but only if TARGET_LIBC_PROVIDES_SSP
>> > is defined (see gcc/config/i386/gnu-user.h or
>> > gcc/config/i386/gnu-user64.h).
>> >
>> > As far as I know musl does not support stack protection, but we could
>> > at least patch gcc to define TARGET_THREAD_SPLIT_STACK_OFFSET when
>> > using musl.
>> >
>> > We also need to reserve a field in the musl pthread struct. There are
>> > currently two fields named 'unused1' and 'unused2' but I'm not sure
>> > they're really unused in every supported arch.
>> >
>> >
>> > BTW, I'd like to work on a more integrated support of split stack in MUSL :
>
> I'm not a fan of split-stack for various reasons, but I have no
> objection to adding support to make it work as long as it's an
> optional feature that does not impair non-split-stack usage.
>
>> > 1) support in dynamic linker (see the last point of
>> > http://gcc.gnu.org/wiki/SplitStacks) : check split stack notes in
>> > shared libs (and program ?)
>
> It could be done, but is it really useful? There are infinitely many
> ways you can crash a program with libraries that were not built
> correctly for use with it. Checking for one of them seems like
> gratuitous complexity with little benefit.
>
>> > 2) support in thread implementation : currently when a thread is
>> > created the stack limit is set afterward (see
>> > https://github.com/mirrors/gcc/blob/master/libgcc/generic-morestack-thread.c
>> > and https://github.com/mirrors/gcc/blob/master/libgcc/config/i386/morestack.S)
>> > and the stack size is supposed to be 16K (which is the minimum stack
>> > size). This means we may reallocate a new stack chunk even if the
>> > previous one (the first one) is not fully used.
>> > If stack limit is set by thread implementation, this can be set
>> > appropriately according to the stack size defined by the thread
>> > creator.
>
> That's perfectly reasonable to support.
>
>> > 3) more optimizations I haven't thought about yet...
>> >
>> 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute
>> to appropriate functions (at least all functions called before
>> pthread_self_init because %gs or %fs register is unusable before this
>> call).
>
> This is definitely not desirable, at least not by default. It hurts
> performance, possibly a lot, and destroys async-signal-safety. Also I
> doubt it's needed. As long as split stack mode leaves at least ~8k
> when calling a new function, most if not all functions in musl should
> run fine without needing support for enlarging the stack.
I agree. This should be made optional. But if we don't compile libc
with fsplit-stack (-fnosplit-stack).
Each call to a libc func from an external func compiled with split
stack may lead to a 64K stack chunk alloc.
>
>> 5) set main thread stack limit to 0 (pthread_self_init) : the main
>> thread stack grow is handled by the kernel.
>>
>> 6) add no-split-stack note to every asm file.
>
> I'm against this, or any boilerplate clutter. If it's really needed,
> it should be possible with CFLAGS (or "ASFLAGS"), rather than
> modifying every file, and if there's no way to do it with command line
> options, that's a bug in gas.
Not supported in gas, already tried.
>
> With that said, why would it be needed? I don't think there are any
> asm files that use more than 32 bytes of stack...
Same reason as 4) : 64K stack chunk allocation.
>
>> 7) make split stack support optional (either by checking the
>> -fsplit-stack option in CFLAGS or with a specific option :
>> --enable-split-stack) : split stack adds overhead to every functions
>> (except for those with the 'no_split_stack' attribute).
>>
>> > Do you have any concern about adding those features in musl ?
>
> Basically, the whole idea of split-stack is antithetical to the QoI
> guarantees of musl. A program using split-stack can crash at any time
> due to out-of-memory, and there is no reliable/portable way to recover
> from this condition. It's much like the following low-quality aspects
> of glibc and default Linux config:
The same program may crash because of stack overflow (segfault) or
worst : corrupt memory.
At best the split stack provides a way to increase the thread without
crashing the whole process.
At worst it crash the program but never corrupt the memory.
>
> - overcommit
> - lazy allocation of libc-internal storage
> - lazy/on-demand allocation of TLS
> - dynamic loading of libgcc_s.so at runtime in pthread_cancel
> - etc.
>
> On 64-bit machines, split-stack is 100% useless. You can get the same
> behavior (crashing on OOM, but not having to know your stack size
> ahead of time) by just turning on overcommit and using huge thread
> stack sizes; the enormous 64-bit virtual address space makes it so you
> don't have to worry about running out of virtual memory.
>
> On 32-bit machines where virtual addresses are a precious resource,
> split-stack is a clever hack that essentially allows you to
> over-commit not just physical memory but virtual memory too. But it's
> inherently non-robust, and even worse than physical memory overcommit.
> At least in the latter case, the kernel can be intelligent about
> choosing an "abusive" process to kill. But if you run out of virtual
> memory, nothing can be done but terminating the whole process (you
> can't just terminate a single thread because it will leave resources
> in an inconsistent state).
>
> As such, I'm willing to add whatever inexpensive support framework is
> needed so that people who want to use split-stack can use it, but I'm
> very wary of invasive or costly changes to support a feature which I
> believe is fundamentally misguided (and, for 64-bit targets, utterly
> useless).

I understand.

>
> Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-16 23:39             ` boris brezillon
@ 2012-10-16 23:48               ` Rich Felker
  2012-10-17  0:08                 ` boris brezillon
  0 siblings, 1 reply; 26+ messages in thread
From: Rich Felker @ 2012-10-16 23:48 UTC (permalink / raw)
  To: musl

On Wed, Oct 17, 2012 at 01:39:49AM +0200, boris brezillon wrote:
> >> 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute
> >> to appropriate functions (at least all functions called before
> >> pthread_self_init because %gs or %fs register is unusable before this
> >> call).
> >
> > This is definitely not desirable, at least not by default. It hurts
> > performance, possibly a lot, and destroys async-signal-safety. Also I
> > doubt it's needed. As long as split stack mode leaves at least ~8k
> > when calling a new function, most if not all functions in musl should
> > run fine without needing support for enlarging the stack.
> I agree. This should be made optional. But if we don't compile libc
> with fsplit-stack (-fnosplit-stack).
> Each call to a libc func from an external func compiled with split
> stack may lead to a 64K stack chunk alloc.

Where does this allocation take place from? There should simply be a
way to inhibit it.

> >> 6) add no-split-stack note to every asm file.
> >
> > I'm against this, or any boilerplate clutter. If it's really needed,
> > it should be possible with CFLAGS (or "ASFLAGS"), rather than
> > modifying every file, and if there's no way to do it with command line
> > options, that's a bug in gas.
> Not supported in gas, already tried.

That's frustrating..

> > Basically, the whole idea of split-stack is antithetical to the QoI
> > guarantees of musl. A program using split-stack can crash at any time
> > due to out-of-memory, and there is no reliable/portable way to recover
> > from this condition. It's much like the following low-quality aspects
> > of glibc and default Linux config:
> The same program may crash because of stack overflow (segfault) or
> worst : corrupt memory.

Only if written improperly. A correctly written program has bounded
stack usage that's easily proven correct with static analysis.
Unbounded stack usage is a bug, plain and simple, because there's no
way to safely and portably handle the runtime error of running out of
memory.

> At best the split stack provides a way to increase the thread without
> crashing the whole process.

If you're comparing the behavior of a program with initial
thread-stack size N and no-split-stack to a program with initial
thread-stack size N that can also obtain additional stack space with
split-stack, and you don't have static bounds on your stack usage that
keep it below N, then I agree that the latter will succeed in cases
where the former crashes. On the other hand, both programs WILL CRASH
under appropriate conditions, and as such, they are both buggy
programs.

> At worst it crash the program but never corrupt the memory.

Memory corruption will not happen without split stack either unless
you turn off guard pages or use functions with huge stack frames
without the -fstack-check option.

> > As such, I'm willing to add whatever inexpensive support framework is
> > needed so that people who want to use split-stack can use it, but I'm
> > very wary of invasive or costly changes to support a feature which I
> > believe is fundamentally misguided (and, for 64-bit targets, utterly
> > useless).
> 
> I understand.

Getting into it more, I think split-stack is a lot harder to support
than anybody has considered, especially if you want to still have a
POSIX conforming environment. There are all sorts of nasty cases
connected to signal handlers, async-signal-safety,
async-cancel-safety, longjmp, and thread cancellation where I know at
the very least you would need some ugly bloated hacks with unwinding
to get them right, and where I'm doubtful you even _can_ make them
100% conforming. Getting this stuff right is highly non-trivial to
begin with, even without split-stack (and glibc doesn't really even
try) so I'm doubtful that the architects of split-stack even thought
about it before throwing their experiment out there for everybody to
use...

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-16 23:48               ` Rich Felker
@ 2012-10-17  0:08                 ` boris brezillon
  2012-10-17  0:42                   ` Rich Felker
  0 siblings, 1 reply; 26+ messages in thread
From: boris brezillon @ 2012-10-17  0:08 UTC (permalink / raw)
  To: musl

2012/10/17 Rich Felker <dalias@aerifal.cx>:
> On Wed, Oct 17, 2012 at 01:39:49AM +0200, boris brezillon wrote:
>> >> 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute
>> >> to appropriate functions (at least all functions called before
>> >> pthread_self_init because %gs or %fs register is unusable before this
>> >> call).
>> >
>> > This is definitely not desirable, at least not by default. It hurts
>> > performance, possibly a lot, and destroys async-signal-safety. Also I
>> > doubt it's needed. As long as split stack mode leaves at least ~8k
>> > when calling a new function, most if not all functions in musl should
>> > run fine without needing support for enlarging the stack.
>> I agree. This should be made optional. But if we don't compile libc
>> with fsplit-stack (-fnosplit-stack).
>> Each call to a libc func from an external func compiled with split
>> stack may lead to a 64K stack chunk alloc.
>
> Where does this allocation take place from? There should simply be a
> way to inhibit it.
In the linker (gold linker).
>
>> >> 6) add no-split-stack note to every asm file.
>> >
>> > I'm against this, or any boilerplate clutter. If it's really needed,
>> > it should be possible with CFLAGS (or "ASFLAGS"), rather than
>> > modifying every file, and if there's no way to do it with command line
>> > options, that's a bug in gas.
>> Not supported in gas, already tried.
>
> That's frustrating..
>
>> > Basically, the whole idea of split-stack is antithetical to the QoI
>> > guarantees of musl. A program using split-stack can crash at any time
>> > due to out-of-memory, and there is no reliable/portable way to recover
>> > from this condition. It's much like the following low-quality aspects
>> > of glibc and default Linux config:
>> The same program may crash because of stack overflow (segfault) or
>> worst : corrupt memory.
>
> Only if written improperly. A correctly written program has bounded
> stack usage that's easily proven correct with static analysis.
> Unbounded stack usage is a bug, plain and simple, because there's no
> way to safely and portably handle the runtime error of running out of
> memory.
>
>> At best the split stack provides a way to increase the thread without
>> crashing the whole process.
>
> If you're comparing the behavior of a program with initial
> thread-stack size N and no-split-stack to a program with initial
> thread-stack size N that can also obtain additional stack space with
> split-stack, and you don't have static bounds on your stack usage that
> keep it below N, then I agree that the latter will succeed in cases
> where the former crashes. On the other hand, both programs WILL CRASH
> under appropriate conditions, and as such, they are both buggy
> programs.
>
>> At worst it crash the program but never corrupt the memory.
>
> Memory corruption will not happen without split stack either unless
> you turn off guard pages or use functions with huge stack frames
> without the -fstack-check option.
>
>> > As such, I'm willing to add whatever inexpensive support framework is
>> > needed so that people who want to use split-stack can use it, but I'm
>> > very wary of invasive or costly changes to support a feature which I
>> > believe is fundamentally misguided (and, for 64-bit targets, utterly
>> > useless).
>>
>> I understand.
>
> Getting into it more, I think split-stack is a lot harder to support
> than anybody has considered, especially if you want to still have a
> POSIX conforming environment. There are all sorts of nasty cases
> connected to signal handlers, async-signal-safety,
> async-cancel-safety, longjmp, and thread cancellation where I know at
> the very least you would need some ugly bloated hacks with unwinding
> to get them right, and where I'm doubtful you even _can_ make them
> 100% conforming. Getting this stuff right is highly non-trivial to
> begin with, even without split-stack (and glibc doesn't really even
> try) so I'm doubtful that the architects of split-stack even thought
> about it before throwing their experiment out there for everybody to
> use...
>
> Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-17  0:08                 ` boris brezillon
@ 2012-10-17  0:42                   ` Rich Felker
  2012-10-17  1:03                     ` boris brezillon
  2012-10-17  1:49                     ` boris brezillon
  0 siblings, 2 replies; 26+ messages in thread
From: Rich Felker @ 2012-10-17  0:42 UTC (permalink / raw)
  To: musl

On Wed, Oct 17, 2012 at 02:08:11AM +0200, boris brezillon wrote:
> >> I agree. This should be made optional. But if we don't compile libc
> >> with fsplit-stack (-fnosplit-stack).
> >> Each call to a libc func from an external func compiled with split
> >> stack may lead to a 64K stack chunk alloc.
> >
> > Where does this allocation take place from? There should simply be a
> > way to inhibit it.
> In the linker (gold linker).

Well gold isn't running at runtime. I assume you mean it _arranges_
for this allocation to take place somehow, and that's what I'm
wondering about whether there's a way to avoid.

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-17  0:42                   ` Rich Felker
@ 2012-10-17  1:03                     ` boris brezillon
  2012-10-17  1:49                     ` boris brezillon
  1 sibling, 0 replies; 26+ messages in thread
From: boris brezillon @ 2012-10-17  1:03 UTC (permalink / raw)
  To: musl

2012/10/17 Rich Felker <dalias@aerifal.cx>:
> On Wed, Oct 17, 2012 at 02:08:11AM +0200, boris brezillon wrote:
>> >> I agree. This should be made optional. But if we don't compile libc
>> >> with fsplit-stack (-fnosplit-stack).
>> >> Each call to a libc func from an external func compiled with split
>> >> stack may lead to a 64K stack chunk alloc.
>> >
>> > Where does this allocation take place from? There should simply be a
>> > way to inhibit it.
>> In the linker (gold linker).
>
> Well gold isn't running at runtime. I assume you mean it _arranges_
> for this allocation to take place somehow, and that's what I'm
> wondering about whether there's a way to avoid.
Sorry,
this is done in __morestack_non_split (libgcc/config/i386/morestack.S).
the linker replaces the __morestack call in the no_split_stack
function's caller by __morestack_non_split.



>
> Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-17  0:42                   ` Rich Felker
  2012-10-17  1:03                     ` boris brezillon
@ 2012-10-17  1:49                     ` boris brezillon
  2012-10-17  1:58                       ` Rich Felker
  1 sibling, 1 reply; 26+ messages in thread
From: boris brezillon @ 2012-10-17  1:49 UTC (permalink / raw)
  To: musl

2012/10/17 Rich Felker <dalias@aerifal.cx>:
> On Wed, Oct 17, 2012 at 02:08:11AM +0200, boris brezillon wrote:
>> >> I agree. This should be made optional. But if we don't compile libc
>> >> with fsplit-stack (-fnosplit-stack).
>> >> Each call to a libc func from an external func compiled with split
>> >> stack may lead to a 64K stack chunk alloc.
>> >
>> > Where does this allocation take place from? There should simply be a
>> > way to inhibit it.
>> In the linker (gold linker).
>
> Well gold isn't running at runtime. I assume you mean it _arranges_
> for this allocation to take place somehow, and that's what I'm
> wondering about whether there's a way to avoid.

The easiest way to avoid big stack chunk allocation is to compile musl
with -fno-split-stack option.
This will not add any overhead to functions (no split stack prolog)
And this will add a note to the shared object which tells the linker
to avoid __morestack to __morestack_non_split replacement.

>
> Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-17  1:49                     ` boris brezillon
@ 2012-10-17  1:58                       ` Rich Felker
  2012-10-17  7:48                         ` musl
  0 siblings, 1 reply; 26+ messages in thread
From: Rich Felker @ 2012-10-17  1:58 UTC (permalink / raw)
  To: musl

On Wed, Oct 17, 2012 at 03:49:33AM +0200, boris brezillon wrote:
> 2012/10/17 Rich Felker <dalias@aerifal.cx>:
> > On Wed, Oct 17, 2012 at 02:08:11AM +0200, boris brezillon wrote:
> >> >> I agree. This should be made optional. But if we don't compile libc
> >> >> with fsplit-stack (-fnosplit-stack).
> >> >> Each call to a libc func from an external func compiled with split
> >> >> stack may lead to a 64K stack chunk alloc.
> >> >
> >> > Where does this allocation take place from? There should simply be a
> >> > way to inhibit it.
> >> In the linker (gold linker).
> >
> > Well gold isn't running at runtime. I assume you mean it _arranges_
> > for this allocation to take place somehow, and that's what I'm
> > wondering about whether there's a way to avoid.
> 
> The easiest way to avoid big stack chunk allocation is to compile musl
> with -fno-split-stack option.
> This will not add any overhead to functions (no split stack prolog)
> And this will add a note to the shared object which tells the linker
> to avoid __morestack to __morestack_non_split replacement.

Where is this documented? The GCC manual doesn't mention anything
about -fno-split-stack having special behavior like that, so for lack
of any documentation otherwise, it "should" just be the option to turn
off -fsplit-stack..

I'm not claiming you're wrong, just that this all seems poorly
documented.

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-17  1:58                       ` Rich Felker
@ 2012-10-17  7:48                         ` musl
  0 siblings, 0 replies; 26+ messages in thread
From: musl @ 2012-10-17  7:48 UTC (permalink / raw)
  To: musl

On 17/10/2012 03:58, Rich Felker wrote:
> On Wed, Oct 17, 2012 at 03:49:33AM +0200, boris brezillon wrote:
>> 2012/10/17 Rich Felker <dalias@aerifal.cx>:
>>> On Wed, Oct 17, 2012 at 02:08:11AM +0200, boris brezillon wrote:
>>>>>> I agree. This should be made optional. But if we don't compile libc
>>>>>> with fsplit-stack (-fnosplit-stack).
>>>>>> Each call to a libc func from an external func compiled with split
>>>>>> stack may lead to a 64K stack chunk alloc.
>>>>> Where does this allocation take place from? There should simply be a
>>>>> way to inhibit it.
>>>> In the linker (gold linker).
>>> Well gold isn't running at runtime. I assume you mean it _arranges_
>>> for this allocation to take place somehow, and that's what I'm
>>> wondering about whether there's a way to avoid.
>> The easiest way to avoid big stack chunk allocation is to compile musl
>> with -fno-split-stack option.
>> This will not add any overhead to functions (no split stack prolog)
>> And this will add a note to the shared object which tells the linker
>> to avoid __morestack to __morestack_non_split replacement.
> Where is this documented? The GCC manual doesn't mention anything
> about -fno-split-stack having special behavior like that, so for lack
> of any documentation otherwise, it "should" just be the option to turn
> off -fsplit-stack..
You're right, I misunderstood how -fno-split-stack was implemented.
I tried to compile a source file with -fno-split-stack and didn't find any 'no-split-stack' note in the generated object
file.
When I compile it with -fsplit-stack both 'no-split-stack' and 'split-stack' notes are added.


>
> I'm not claiming you're wrong, just that this all seems poorly
> documented.
>
> Rich



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-16 23:16             ` boris brezillon
@ 2012-10-17 10:37               ` Szabolcs Nagy
  0 siblings, 0 replies; 26+ messages in thread
From: Szabolcs Nagy @ 2012-10-17 10:37 UTC (permalink / raw)
  To: musl

* boris brezillon <b.brezillon.musl@gmail.com> [2012-10-17 01:16:43 +0200]:
> > most functions in musl may be run before pthread_self_init
> > (it runs on demand when a pthread function is used)
> This can be done during dynamic linking process (by checking the split
> stack note).

i meant that you would need to annotate almost all musl
functions as no_split_stack because normally thread
pointer is not initialized
(but dalias commented that this might change)

the dynamic loader can only do the initialization for
dynamically linked executables

so it's easier to just not compile musl with -fsplit-stack

musl can easily give guarantees about its maximum stack
usage assuming there is a bound to function call overhead
and alignment overhead of auto variables etc

(but dalias already gave better explanation)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-16 21:47         ` boris brezillon
  2012-10-16 22:09           ` Szabolcs Nagy
  2012-10-16 22:54           ` Rich Felker
@ 2012-10-19 18:39           ` orc
  2012-10-19 18:41             ` Rich Felker
  2 siblings, 1 reply; 26+ messages in thread
From: orc @ 2012-10-19 18:39 UTC (permalink / raw)
  To: musl

On Tue, 16 Oct 2012 23:47:52 +0200
boris brezillon <b.brezillon.musl@gmail.com> wrote:

> 2012/10/16 boris brezillon <b.brezillon.musl@gmail.com>:
> > Hi,
> >
> > First I'd like to thank Rich for adding TLS support (I started to
> > work on it a few weeks ago but never had time to finish it).
> >
> > 2012/10/6 Daniel Cegiełka <daniel.cegielka@gmail.com>:
> >> 2012/10/5 Rich Felker <dalias@aerifal.cx>:
> >>> On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote:
> >>>> great news! Finally able to compile Go (lang)...
> >>>
> >>> Did Go fail with gcc's emulated TLS in libgcc?
> >>
> >> I tested Go with sabotage (with fresh musl). I'll try to do it
> >> again... gcc in sabotage was compiled without support for TLS, so
> >> I didn't expect that it will be successful:
> >>
> >> https://github.com/rofl0r/sabotage/blob/master/pkg/gcc4
> >>
> > There's at least one thing (maybe more) missing for go support with
> > musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849
> > and http://gcc.gnu.org/wiki/SplitStacks).
> >
> > I'm also interested in split stack support in musl but for other
> > reasons (thread and coroutine stack automatic expansion).
> >
> > For x86/x86_64 split stack is implemented using a field inside the
> > pthread struct which is accessed via %fs (or %gs for x86_64) and an
> > offset.
> >
> > Currently this offset is defined at 0x30 (0x70 for x86_64) by the
> > TARGET_THREAD_SPLIT_STACK_OFFSET but only if
> > TARGET_LIBC_PROVIDES_SSP is defined (see gcc/config/i386/gnu-user.h
> > or gcc/config/i386/gnu-user64.h).
> >
> > As far as I know musl does not support stack protection, but we
> > could at least patch gcc to define TARGET_THREAD_SPLIT_STACK_OFFSET
> > when using musl.
> >
> > We also need to reserve a field in the musl pthread struct. There
> > are currently two fields named 'unused1' and 'unused2' but I'm not
> > sure they're really unused in every supported arch.
> >
> >
> > BTW, I'd like to work on a more integrated support of split stack
> > in MUSL :
> >
> > 1) support in dynamic linker (see the last point of
> > http://gcc.gnu.org/wiki/SplitStacks) : check split stack notes in
> > shared libs (and program ?)
> >
> > 2) support in thread implementation : currently when a thread is
> > created the stack limit is set afterward (see
> > https://github.com/mirrors/gcc/blob/master/libgcc/generic-morestack-thread.c
> > and
> > https://github.com/mirrors/gcc/blob/master/libgcc/config/i386/morestack.S)
> > and the stack size is supposed to be 16K (which is the minimum
> > stack size). This means we may reallocate a new stack chunk even if
> > the previous one (the first one) is not fully used. If stack limit
> > is set by thread implementation, this can be set appropriately
> > according to the stack size defined by the thread creator.
> >
> > 3) more optimizations I haven't thought about yet...
> >
> 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute
> to appropriate functions (at least all functions called before
> pthread_self_init because %gs or %fs register is unusable before this
> call).
> 
> 5) set main thread stack limit to 0 (pthread_self_init) : the main
> thread stack grow is handled by the kernel.
> 
> 6) add no-split-stack note to every asm file.
Why anything works only after putting a weak spikes that break after a
slight touch?

> 
> 7) make split stack support optional (either by checking the
> -fsplit-stack option in CFLAGS or with a specific option :
> --enable-split-stack) : split stack adds overhead to every functions
> (except for those with the 'no_split_stack' attribute).
> 
> > Do you have any concern about adding those features in musl ?
> >
> > Let me know if you see other issues I haven't noticed.
> >
> >
> > Regards,
> >
> > Boris

After reading whole thread I agree with Rich that this one is not only
hard to implement, but completely useless. From other point of view:
people expect from musl an easy to read and understand code, that not
only works, but is easy to understand, modify, debug and build. Why
extend it with features not even related to libc? (It is mostly a hack
from gcc-binutils again?)
Not only saying a word about people that use (or will use) other
compilers and linkers.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TLS (thread-local storage) support
  2012-10-19 18:39           ` orc
@ 2012-10-19 18:41             ` Rich Felker
  0 siblings, 0 replies; 26+ messages in thread
From: Rich Felker @ 2012-10-19 18:41 UTC (permalink / raw)
  To: musl

On Sat, Oct 20, 2012 at 02:39:43AM +0800, orc wrote:
> > 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute
> > to appropriate functions (at least all functions called before
> > pthread_self_init because %gs or %fs register is unusable before this
> > call).
> > 
> > 5) set main thread stack limit to 0 (pthread_self_init) : the main
> > thread stack grow is handled by the kernel.
> > 
> > 6) add no-split-stack note to every asm file.
> Why anything works only after putting a weak spikes that break after a
> slight touch?

I don't follow what you're saying here.

> > 7) make split stack support optional (either by checking the
> > -fsplit-stack option in CFLAGS or with a specific option :
> > --enable-split-stack) : split stack adds overhead to every functions
> > (except for those with the 'no_split_stack' attribute).
> > 
> > > Do you have any concern about adding those features in musl ?
> > >
> > > Let me know if you see other issues I haven't noticed.
> > >
> > >
> > > Regards,
> > >
> > > Boris
> 
> After reading whole thread I agree with Rich that this one is not only
> hard to implement, but completely useless. From other point of view:

I think it's hard (read: probably impossible) to implement in a way
that's robust and correct, but it may not be too hard to implement the
minimal support code so that folks who insist on using -fsplit-stack
will not get pathologically bad behavior due to the calling code being
unaware that is already has a plenty pre-allocated stack space to run
on.

> people expect from musl an easy to read and understand code, that not
> only works, but is easy to understand, modify, debug and build. Why
> extend it with features not even related to libc? (It is mostly a hack
> from gcc-binutils again?)

I agree. I definitely don't want to compromise on
correctness/robustness for the sake of this, and I'd also like to
avoid adding complexity or maintenance burdens.

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-10-19 18:41 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-04 21:13 TLS (thread-local storage) support Rich Felker
2012-10-04 21:29 ` Daniel Cegiełka
2012-10-04 22:36   ` Rich Felker
2012-10-06  8:17     ` Daniel Cegiełka
2012-10-16 21:27       ` boris brezillon
2012-10-16 21:47         ` boris brezillon
2012-10-16 22:09           ` Szabolcs Nagy
2012-10-16 23:16             ` boris brezillon
2012-10-17 10:37               ` Szabolcs Nagy
2012-10-16 23:29             ` Rich Felker
2012-10-16 22:54           ` Rich Felker
2012-10-16 23:39             ` boris brezillon
2012-10-16 23:48               ` Rich Felker
2012-10-17  0:08                 ` boris brezillon
2012-10-17  0:42                   ` Rich Felker
2012-10-17  1:03                     ` boris brezillon
2012-10-17  1:49                     ` boris brezillon
2012-10-17  1:58                       ` Rich Felker
2012-10-17  7:48                         ` musl
2012-10-19 18:39           ` orc
2012-10-19 18:41             ` Rich Felker
2012-10-05  3:04 ` Rich Felker
2012-10-05 17:27   ` Rich Felker
2012-10-06 14:33     ` Szabolcs Nagy
2012-10-06 20:39       ` Szabolcs Nagy
2012-10-06 20:58         ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).