* TLS (thread-local storage) support @ 2012-10-04 21:13 Rich Felker 2012-10-04 21:29 ` Daniel Cegiełka 2012-10-05 3:04 ` Rich Felker 0 siblings, 2 replies; 26+ messages in thread From: Rich Felker @ 2012-10-04 21:13 UTC (permalink / raw) To: musl Hi, I've committed the initial version of thread-local storage (__thread/_Thread_local keyword). So far, it only works in static-linked applications, and might or might not be working properly on arm, mips, and microblaze. The latter is a matter of whether these archs need "TLS variant I" instead of the much cleaner/saner "variant II" used by i386 and x86_64; unfortunately, Drepper's paper on TLS ABI omits most of the interesting archs in favor of dying or dead ones like Itanium, so I'm going to have to dig into other sources to find out if musl needs to special-case any or all of these. I also have the design for dynamic-linked TLS mostly worked out, but need to make some changes to the dynamic linker to get it integrated. Should be coming soon. Reports of success or problems encountered are welcome, especially on non-x86 archs, would be interesting/welcome. Note that if you've been building gcc with --disable-tls, __thread was already working but gets emulated (very poorly; it's slow and will abort() if it runs out of memory) through libgcc. Such compilers are useless for testing the new real TLS support, so rebuild without --disable-tls if needed before testing. Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-04 21:13 TLS (thread-local storage) support Rich Felker @ 2012-10-04 21:29 ` Daniel Cegiełka 2012-10-04 22:36 ` Rich Felker 2012-10-05 3:04 ` Rich Felker 1 sibling, 1 reply; 26+ messages in thread From: Daniel Cegiełka @ 2012-10-04 21:29 UTC (permalink / raw) To: musl great news! Finally able to compile Go (lang)... thx, Daniel ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-04 21:29 ` Daniel Cegiełka @ 2012-10-04 22:36 ` Rich Felker 2012-10-06 8:17 ` Daniel Cegiełka 0 siblings, 1 reply; 26+ messages in thread From: Rich Felker @ 2012-10-04 22:36 UTC (permalink / raw) To: musl On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote: > great news! Finally able to compile Go (lang)... Did Go fail with gcc's emulated TLS in libgcc? My impression is that it should usually/always work, but it's just very slow and low-quality (lazy allocation). This isn't gcc's fault, just the fact that it's impossible to emulate correctly. On the other hand, Go might be generating code that accesses TLS directly, in which case the emulation may not suffice. BTW, does Go work with static linking? If not, you might need to wait to celebrate until I add the dynamic-linked TLS support... Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-04 22:36 ` Rich Felker @ 2012-10-06 8:17 ` Daniel Cegiełka 2012-10-16 21:27 ` boris brezillon 0 siblings, 1 reply; 26+ messages in thread From: Daniel Cegiełka @ 2012-10-06 8:17 UTC (permalink / raw) To: musl 2012/10/5 Rich Felker <dalias@aerifal.cx>: > On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote: >> great news! Finally able to compile Go (lang)... > > Did Go fail with gcc's emulated TLS in libgcc? I tested Go with sabotage (with fresh musl). I'll try to do it again... gcc in sabotage was compiled without support for TLS, so I didn't expect that it will be successful: https://github.com/rofl0r/sabotage/blob/master/pkg/gcc4 > My impression is that > it should usually/always work, but it's just very slow and > low-quality (lazy allocation). This isn't gcc's fault, just the fact > that it's impossible to emulate correctly. On the other hand, Go might > be generating code that accesses TLS directly, in which case the > emulation may not suffice. > > BTW, does Go work with static linking? If not, you might need to wait > to celebrate until I add the dynamic-linked TLS support... https://groups.google.com/forum/?fromgroups=#!topic/golang-nuts/N5QCFkXon0M Daniel > Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-06 8:17 ` Daniel Cegiełka @ 2012-10-16 21:27 ` boris brezillon 2012-10-16 21:47 ` boris brezillon 0 siblings, 1 reply; 26+ messages in thread From: boris brezillon @ 2012-10-16 21:27 UTC (permalink / raw) To: musl Hi, First I'd like to thank Rich for adding TLS support (I started to work on it a few weeks ago but never had time to finish it). 2012/10/6 Daniel Cegiełka <daniel.cegielka@gmail.com>: > 2012/10/5 Rich Felker <dalias@aerifal.cx>: >> On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote: >>> great news! Finally able to compile Go (lang)... >> >> Did Go fail with gcc's emulated TLS in libgcc? > > I tested Go with sabotage (with fresh musl). I'll try to do it again... > gcc in sabotage was compiled without support for TLS, so I didn't > expect that it will be successful: > > https://github.com/rofl0r/sabotage/blob/master/pkg/gcc4 > There's at least one thing (maybe more) missing for go support with musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849 and http://gcc.gnu.org/wiki/SplitStacks). I'm also interested in split stack support in musl but for other reasons (thread and coroutine stack automatic expansion). For x86/x86_64 split stack is implemented using a field inside the pthread struct which is accessed via %fs (or %gs for x86_64) and an offset. Currently this offset is defined at 0x30 (0x70 for x86_64) by the TARGET_THREAD_SPLIT_STACK_OFFSET but only if TARGET_LIBC_PROVIDES_SSP is defined (see gcc/config/i386/gnu-user.h or gcc/config/i386/gnu-user64.h). As far as I know musl does not support stack protection, but we could at least patch gcc to define TARGET_THREAD_SPLIT_STACK_OFFSET when using musl. We also need to reserve a field in the musl pthread struct. There are currently two fields named 'unused1' and 'unused2' but I'm not sure they're really unused in every supported arch. BTW, I'd like to work on a more integrated support of split stack in MUSL : 1) support in dynamic linker (see the last point of http://gcc.gnu.org/wiki/SplitStacks) : check split stack notes in shared libs (and program ?) 2) support in thread implementation : currently when a thread is created the stack limit is set afterward (see https://github.com/mirrors/gcc/blob/master/libgcc/generic-morestack-thread.c and https://github.com/mirrors/gcc/blob/master/libgcc/config/i386/morestack.S) and the stack size is supposed to be 16K (which is the minimum stack size). This means we may reallocate a new stack chunk even if the previous one (the first one) is not fully used. If stack limit is set by thread implementation, this can be set appropriately according to the stack size defined by the thread creator. 3) more optimizations I haven't thought about yet... Do you have any concern about adding those features in musl ? Let me know if you see other issues I haven't noticed. Regards, Boris ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-16 21:27 ` boris brezillon @ 2012-10-16 21:47 ` boris brezillon 2012-10-16 22:09 ` Szabolcs Nagy ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: boris brezillon @ 2012-10-16 21:47 UTC (permalink / raw) To: musl 2012/10/16 boris brezillon <b.brezillon.musl@gmail.com>: > Hi, > > First I'd like to thank Rich for adding TLS support (I started to work > on it a few weeks ago but never had time to finish it). > > 2012/10/6 Daniel Cegiełka <daniel.cegielka@gmail.com>: >> 2012/10/5 Rich Felker <dalias@aerifal.cx>: >>> On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote: >>>> great news! Finally able to compile Go (lang)... >>> >>> Did Go fail with gcc's emulated TLS in libgcc? >> >> I tested Go with sabotage (with fresh musl). I'll try to do it again... >> gcc in sabotage was compiled without support for TLS, so I didn't >> expect that it will be successful: >> >> https://github.com/rofl0r/sabotage/blob/master/pkg/gcc4 >> > There's at least one thing (maybe more) missing for go support with > musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849 and > http://gcc.gnu.org/wiki/SplitStacks). > > I'm also interested in split stack support in musl but for other > reasons (thread and coroutine stack automatic expansion). > > For x86/x86_64 split stack is implemented using a field inside the > pthread struct which is accessed via %fs (or %gs for x86_64) and an > offset. > > Currently this offset is defined at 0x30 (0x70 for x86_64) by the > TARGET_THREAD_SPLIT_STACK_OFFSET but only if TARGET_LIBC_PROVIDES_SSP > is defined (see gcc/config/i386/gnu-user.h or > gcc/config/i386/gnu-user64.h). > > As far as I know musl does not support stack protection, but we could > at least patch gcc to define TARGET_THREAD_SPLIT_STACK_OFFSET when > using musl. > > We also need to reserve a field in the musl pthread struct. There are > currently two fields named 'unused1' and 'unused2' but I'm not sure > they're really unused in every supported arch. > > > BTW, I'd like to work on a more integrated support of split stack in MUSL : > > 1) support in dynamic linker (see the last point of > http://gcc.gnu.org/wiki/SplitStacks) : check split stack notes in > shared libs (and program ?) > > 2) support in thread implementation : currently when a thread is > created the stack limit is set afterward (see > https://github.com/mirrors/gcc/blob/master/libgcc/generic-morestack-thread.c > and https://github.com/mirrors/gcc/blob/master/libgcc/config/i386/morestack.S) > and the stack size is supposed to be 16K (which is the minimum stack > size). This means we may reallocate a new stack chunk even if the > previous one (the first one) is not fully used. > If stack limit is set by thread implementation, this can be set > appropriately according to the stack size defined by the thread > creator. > > 3) more optimizations I haven't thought about yet... > 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute to appropriate functions (at least all functions called before pthread_self_init because %gs or %fs register is unusable before this call). 5) set main thread stack limit to 0 (pthread_self_init) : the main thread stack grow is handled by the kernel. 6) add no-split-stack note to every asm file. 7) make split stack support optional (either by checking the -fsplit-stack option in CFLAGS or with a specific option : --enable-split-stack) : split stack adds overhead to every functions (except for those with the 'no_split_stack' attribute). > Do you have any concern about adding those features in musl ? > > Let me know if you see other issues I haven't noticed. > > > Regards, > > Boris ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-16 21:47 ` boris brezillon @ 2012-10-16 22:09 ` Szabolcs Nagy 2012-10-16 23:16 ` boris brezillon 2012-10-16 23:29 ` Rich Felker 2012-10-16 22:54 ` Rich Felker 2012-10-19 18:39 ` orc 2 siblings, 2 replies; 26+ messages in thread From: Szabolcs Nagy @ 2012-10-16 22:09 UTC (permalink / raw) To: musl * boris brezillon <b.brezillon.musl@gmail.com> [2012-10-16 23:47:52 +0200]: > > There's at least one thing (maybe more) missing for go support with > > musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849 and > > http://gcc.gnu.org/wiki/SplitStacks). > > why does go need support from libc? it has its own runtime and libraries on raw syscalls > 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute > to appropriate functions (at least all functions called before > pthread_self_init because %gs or %fs register is unusable before this > call). > what does a no_split_stack function do when it runs out of stack? most functions in musl may be run before pthread_self_init (it runs on demand when a pthread function is used) what's the use of split stack if some functions may not work with it? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-16 22:09 ` Szabolcs Nagy @ 2012-10-16 23:16 ` boris brezillon 2012-10-17 10:37 ` Szabolcs Nagy 2012-10-16 23:29 ` Rich Felker 1 sibling, 1 reply; 26+ messages in thread From: boris brezillon @ 2012-10-16 23:16 UTC (permalink / raw) To: musl 2012/10/17 Szabolcs Nagy <nsz@port70.net>: > * boris brezillon <b.brezillon.musl@gmail.com> [2012-10-16 23:47:52 +0200]: >> > There's at least one thing (maybe more) missing for go support with >> > musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849 and >> > http://gcc.gnu.org/wiki/SplitStacks). >> > > > why does go need support from libc? You're right: 1) I was talking about gccgo but I realized there's another compiler (gc go) which does not rely on gcc at all. 2) split stack is not mandatory for gccgo (see libgo/configure.ac in gcc sources) But it's still possible to enable split-stack and in this case go runtime relies on some libc functions (see libgcc/generic-morestack*). > > it has its own runtime and libraries on raw syscalls > >> 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute >> to appropriate functions (at least all functions called before >> pthread_self_init because %gs or %fs register is unusable before this >> call). >> > > what does a no_split_stack function do when it runs out of stack? Segfault. no_split_stack attribute is used for leaf functions or functions call tree where the maximum stack size never exceed the reserved space for extra stack chunk allocation (I don't remember the exact value). > > most functions in musl may be run before pthread_self_init > (it runs on demand when a pthread function is used) This can be done during dynamic linking process (by checking the split stack note). > > what's the use of split stack if some functions may not work with it? Only the explicitly specified functions (no_split_stack attribute) won't include the split stack prolog. This is the developer's responsability to carefully choose which one to tag as 'no_split_stack'. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-16 23:16 ` boris brezillon @ 2012-10-17 10:37 ` Szabolcs Nagy 0 siblings, 0 replies; 26+ messages in thread From: Szabolcs Nagy @ 2012-10-17 10:37 UTC (permalink / raw) To: musl * boris brezillon <b.brezillon.musl@gmail.com> [2012-10-17 01:16:43 +0200]: > > most functions in musl may be run before pthread_self_init > > (it runs on demand when a pthread function is used) > This can be done during dynamic linking process (by checking the split > stack note). i meant that you would need to annotate almost all musl functions as no_split_stack because normally thread pointer is not initialized (but dalias commented that this might change) the dynamic loader can only do the initialization for dynamically linked executables so it's easier to just not compile musl with -fsplit-stack musl can easily give guarantees about its maximum stack usage assuming there is a bound to function call overhead and alignment overhead of auto variables etc (but dalias already gave better explanation) ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-16 22:09 ` Szabolcs Nagy 2012-10-16 23:16 ` boris brezillon @ 2012-10-16 23:29 ` Rich Felker 1 sibling, 0 replies; 26+ messages in thread From: Rich Felker @ 2012-10-16 23:29 UTC (permalink / raw) To: musl On Wed, Oct 17, 2012 at 12:09:22AM +0200, Szabolcs Nagy wrote: > most functions in musl may be run before pthread_self_init > (it runs on demand when a pthread function is used) This is tangential, but I've been considering changing that for a long time. My thought is to have startup code always attempt to setup the thread pointer (except in static binaries where it's statically determined that nothing will use it). If it failed with ENOSYS (missing syscall due to old kernel), musl would save a flag indicating such and have minimal support code to prevent crashing when using "plain libc" functions that have nothing to do with threads, so that old/simple software can run even on Linux 2.4. If it failed with any other reason (shouldn't be able to happen, but Linux is always introducing stupid resource-exhaustion reasons things can fail...) a_crash would be called before execution passes to the application code. Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-16 21:47 ` boris brezillon 2012-10-16 22:09 ` Szabolcs Nagy @ 2012-10-16 22:54 ` Rich Felker 2012-10-16 23:39 ` boris brezillon 2012-10-19 18:39 ` orc 2 siblings, 1 reply; 26+ messages in thread From: Rich Felker @ 2012-10-16 22:54 UTC (permalink / raw) To: musl On Tue, Oct 16, 2012 at 11:47:52PM +0200, boris brezillon wrote: > 2012/10/16 boris brezillon <b.brezillon.musl@gmail.com>: > > Hi, > > > > First I'd like to thank Rich for adding TLS support (I started to work > > on it a few weeks ago but never had time to finish it). > > > > 2012/10/6 Daniel Cegiełka <daniel.cegielka@gmail.com>: > >> 2012/10/5 Rich Felker <dalias@aerifal.cx>: > >>> On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote: > >>>> great news! Finally able to compile Go (lang)... > >>> > >>> Did Go fail with gcc's emulated TLS in libgcc? > >> > >> I tested Go with sabotage (with fresh musl). I'll try to do it again... > >> gcc in sabotage was compiled without support for TLS, so I didn't > >> expect that it will be successful: > >> > >> https://github.com/rofl0r/sabotage/blob/master/pkg/gcc4 > >> > > There's at least one thing (maybe more) missing for go support with > > musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849 and > > http://gcc.gnu.org/wiki/SplitStacks). > > > > I'm also interested in split stack support in musl but for other > > reasons (thread and coroutine stack automatic expansion). > > > > For x86/x86_64 split stack is implemented using a field inside the > > pthread struct which is accessed via %fs (or %gs for x86_64) and an > > offset. > > > > Currently this offset is defined at 0x30 (0x70 for x86_64) by the > > TARGET_THREAD_SPLIT_STACK_OFFSET but only if TARGET_LIBC_PROVIDES_SSP > > is defined (see gcc/config/i386/gnu-user.h or > > gcc/config/i386/gnu-user64.h). > > > > As far as I know musl does not support stack protection, but we could > > at least patch gcc to define TARGET_THREAD_SPLIT_STACK_OFFSET when > > using musl. > > > > We also need to reserve a field in the musl pthread struct. There are > > currently two fields named 'unused1' and 'unused2' but I'm not sure > > they're really unused in every supported arch. > > > > > > BTW, I'd like to work on a more integrated support of split stack in MUSL : I'm not a fan of split-stack for various reasons, but I have no objection to adding support to make it work as long as it's an optional feature that does not impair non-split-stack usage. > > 1) support in dynamic linker (see the last point of > > http://gcc.gnu.org/wiki/SplitStacks) : check split stack notes in > > shared libs (and program ?) It could be done, but is it really useful? There are infinitely many ways you can crash a program with libraries that were not built correctly for use with it. Checking for one of them seems like gratuitous complexity with little benefit. > > 2) support in thread implementation : currently when a thread is > > created the stack limit is set afterward (see > > https://github.com/mirrors/gcc/blob/master/libgcc/generic-morestack-thread.c > > and https://github.com/mirrors/gcc/blob/master/libgcc/config/i386/morestack.S) > > and the stack size is supposed to be 16K (which is the minimum stack > > size). This means we may reallocate a new stack chunk even if the > > previous one (the first one) is not fully used. > > If stack limit is set by thread implementation, this can be set > > appropriately according to the stack size defined by the thread > > creator. That's perfectly reasonable to support. > > 3) more optimizations I haven't thought about yet... > > > 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute > to appropriate functions (at least all functions called before > pthread_self_init because %gs or %fs register is unusable before this > call). This is definitely not desirable, at least not by default. It hurts performance, possibly a lot, and destroys async-signal-safety. Also I doubt it's needed. As long as split stack mode leaves at least ~8k when calling a new function, most if not all functions in musl should run fine without needing support for enlarging the stack. > 5) set main thread stack limit to 0 (pthread_self_init) : the main > thread stack grow is handled by the kernel. > > 6) add no-split-stack note to every asm file. I'm against this, or any boilerplate clutter. If it's really needed, it should be possible with CFLAGS (or "ASFLAGS"), rather than modifying every file, and if there's no way to do it with command line options, that's a bug in gas. With that said, why would it be needed? I don't think there are any asm files that use more than 32 bytes of stack... > 7) make split stack support optional (either by checking the > -fsplit-stack option in CFLAGS or with a specific option : > --enable-split-stack) : split stack adds overhead to every functions > (except for those with the 'no_split_stack' attribute). > > > Do you have any concern about adding those features in musl ? Basically, the whole idea of split-stack is antithetical to the QoI guarantees of musl. A program using split-stack can crash at any time due to out-of-memory, and there is no reliable/portable way to recover from this condition. It's much like the following low-quality aspects of glibc and default Linux config: - overcommit - lazy allocation of libc-internal storage - lazy/on-demand allocation of TLS - dynamic loading of libgcc_s.so at runtime in pthread_cancel - etc. On 64-bit machines, split-stack is 100% useless. You can get the same behavior (crashing on OOM, but not having to know your stack size ahead of time) by just turning on overcommit and using huge thread stack sizes; the enormous 64-bit virtual address space makes it so you don't have to worry about running out of virtual memory. On 32-bit machines where virtual addresses are a precious resource, split-stack is a clever hack that essentially allows you to over-commit not just physical memory but virtual memory too. But it's inherently non-robust, and even worse than physical memory overcommit. At least in the latter case, the kernel can be intelligent about choosing an "abusive" process to kill. But if you run out of virtual memory, nothing can be done but terminating the whole process (you can't just terminate a single thread because it will leave resources in an inconsistent state). As such, I'm willing to add whatever inexpensive support framework is needed so that people who want to use split-stack can use it, but I'm very wary of invasive or costly changes to support a feature which I believe is fundamentally misguided (and, for 64-bit targets, utterly useless). Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-16 22:54 ` Rich Felker @ 2012-10-16 23:39 ` boris brezillon 2012-10-16 23:48 ` Rich Felker 0 siblings, 1 reply; 26+ messages in thread From: boris brezillon @ 2012-10-16 23:39 UTC (permalink / raw) To: musl 2012/10/17 Rich Felker <dalias@aerifal.cx>: > On Tue, Oct 16, 2012 at 11:47:52PM +0200, boris brezillon wrote: >> 2012/10/16 boris brezillon <b.brezillon.musl@gmail.com>: >> > Hi, >> > >> > First I'd like to thank Rich for adding TLS support (I started to work >> > on it a few weeks ago but never had time to finish it). >> > >> > 2012/10/6 Daniel Cegiełka <daniel.cegielka@gmail.com>: >> >> 2012/10/5 Rich Felker <dalias@aerifal.cx>: >> >>> On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote: >> >>>> great news! Finally able to compile Go (lang)... >> >>> >> >>> Did Go fail with gcc's emulated TLS in libgcc? >> >> >> >> I tested Go with sabotage (with fresh musl). I'll try to do it again... >> >> gcc in sabotage was compiled without support for TLS, so I didn't >> >> expect that it will be successful: >> >> >> >> https://github.com/rofl0r/sabotage/blob/master/pkg/gcc4 >> >> >> > There's at least one thing (maybe more) missing for go support with >> > musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849 and >> > http://gcc.gnu.org/wiki/SplitStacks). >> > >> > I'm also interested in split stack support in musl but for other >> > reasons (thread and coroutine stack automatic expansion). >> > >> > For x86/x86_64 split stack is implemented using a field inside the >> > pthread struct which is accessed via %fs (or %gs for x86_64) and an >> > offset. >> > >> > Currently this offset is defined at 0x30 (0x70 for x86_64) by the >> > TARGET_THREAD_SPLIT_STACK_OFFSET but only if TARGET_LIBC_PROVIDES_SSP >> > is defined (see gcc/config/i386/gnu-user.h or >> > gcc/config/i386/gnu-user64.h). >> > >> > As far as I know musl does not support stack protection, but we could >> > at least patch gcc to define TARGET_THREAD_SPLIT_STACK_OFFSET when >> > using musl. >> > >> > We also need to reserve a field in the musl pthread struct. There are >> > currently two fields named 'unused1' and 'unused2' but I'm not sure >> > they're really unused in every supported arch. >> > >> > >> > BTW, I'd like to work on a more integrated support of split stack in MUSL : > > I'm not a fan of split-stack for various reasons, but I have no > objection to adding support to make it work as long as it's an > optional feature that does not impair non-split-stack usage. > >> > 1) support in dynamic linker (see the last point of >> > http://gcc.gnu.org/wiki/SplitStacks) : check split stack notes in >> > shared libs (and program ?) > > It could be done, but is it really useful? There are infinitely many > ways you can crash a program with libraries that were not built > correctly for use with it. Checking for one of them seems like > gratuitous complexity with little benefit. > >> > 2) support in thread implementation : currently when a thread is >> > created the stack limit is set afterward (see >> > https://github.com/mirrors/gcc/blob/master/libgcc/generic-morestack-thread.c >> > and https://github.com/mirrors/gcc/blob/master/libgcc/config/i386/morestack.S) >> > and the stack size is supposed to be 16K (which is the minimum stack >> > size). This means we may reallocate a new stack chunk even if the >> > previous one (the first one) is not fully used. >> > If stack limit is set by thread implementation, this can be set >> > appropriately according to the stack size defined by the thread >> > creator. > > That's perfectly reasonable to support. > >> > 3) more optimizations I haven't thought about yet... >> > >> 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute >> to appropriate functions (at least all functions called before >> pthread_self_init because %gs or %fs register is unusable before this >> call). > > This is definitely not desirable, at least not by default. It hurts > performance, possibly a lot, and destroys async-signal-safety. Also I > doubt it's needed. As long as split stack mode leaves at least ~8k > when calling a new function, most if not all functions in musl should > run fine without needing support for enlarging the stack. I agree. This should be made optional. But if we don't compile libc with fsplit-stack (-fnosplit-stack). Each call to a libc func from an external func compiled with split stack may lead to a 64K stack chunk alloc. > >> 5) set main thread stack limit to 0 (pthread_self_init) : the main >> thread stack grow is handled by the kernel. >> >> 6) add no-split-stack note to every asm file. > > I'm against this, or any boilerplate clutter. If it's really needed, > it should be possible with CFLAGS (or "ASFLAGS"), rather than > modifying every file, and if there's no way to do it with command line > options, that's a bug in gas. Not supported in gas, already tried. > > With that said, why would it be needed? I don't think there are any > asm files that use more than 32 bytes of stack... Same reason as 4) : 64K stack chunk allocation. > >> 7) make split stack support optional (either by checking the >> -fsplit-stack option in CFLAGS or with a specific option : >> --enable-split-stack) : split stack adds overhead to every functions >> (except for those with the 'no_split_stack' attribute). >> >> > Do you have any concern about adding those features in musl ? > > Basically, the whole idea of split-stack is antithetical to the QoI > guarantees of musl. A program using split-stack can crash at any time > due to out-of-memory, and there is no reliable/portable way to recover > from this condition. It's much like the following low-quality aspects > of glibc and default Linux config: The same program may crash because of stack overflow (segfault) or worst : corrupt memory. At best the split stack provides a way to increase the thread without crashing the whole process. At worst it crash the program but never corrupt the memory. > > - overcommit > - lazy allocation of libc-internal storage > - lazy/on-demand allocation of TLS > - dynamic loading of libgcc_s.so at runtime in pthread_cancel > - etc. > > On 64-bit machines, split-stack is 100% useless. You can get the same > behavior (crashing on OOM, but not having to know your stack size > ahead of time) by just turning on overcommit and using huge thread > stack sizes; the enormous 64-bit virtual address space makes it so you > don't have to worry about running out of virtual memory. > > On 32-bit machines where virtual addresses are a precious resource, > split-stack is a clever hack that essentially allows you to > over-commit not just physical memory but virtual memory too. But it's > inherently non-robust, and even worse than physical memory overcommit. > At least in the latter case, the kernel can be intelligent about > choosing an "abusive" process to kill. But if you run out of virtual > memory, nothing can be done but terminating the whole process (you > can't just terminate a single thread because it will leave resources > in an inconsistent state). > > As such, I'm willing to add whatever inexpensive support framework is > needed so that people who want to use split-stack can use it, but I'm > very wary of invasive or costly changes to support a feature which I > believe is fundamentally misguided (and, for 64-bit targets, utterly > useless). I understand. > > Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-16 23:39 ` boris brezillon @ 2012-10-16 23:48 ` Rich Felker 2012-10-17 0:08 ` boris brezillon 0 siblings, 1 reply; 26+ messages in thread From: Rich Felker @ 2012-10-16 23:48 UTC (permalink / raw) To: musl On Wed, Oct 17, 2012 at 01:39:49AM +0200, boris brezillon wrote: > >> 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute > >> to appropriate functions (at least all functions called before > >> pthread_self_init because %gs or %fs register is unusable before this > >> call). > > > > This is definitely not desirable, at least not by default. It hurts > > performance, possibly a lot, and destroys async-signal-safety. Also I > > doubt it's needed. As long as split stack mode leaves at least ~8k > > when calling a new function, most if not all functions in musl should > > run fine without needing support for enlarging the stack. > I agree. This should be made optional. But if we don't compile libc > with fsplit-stack (-fnosplit-stack). > Each call to a libc func from an external func compiled with split > stack may lead to a 64K stack chunk alloc. Where does this allocation take place from? There should simply be a way to inhibit it. > >> 6) add no-split-stack note to every asm file. > > > > I'm against this, or any boilerplate clutter. If it's really needed, > > it should be possible with CFLAGS (or "ASFLAGS"), rather than > > modifying every file, and if there's no way to do it with command line > > options, that's a bug in gas. > Not supported in gas, already tried. That's frustrating.. > > Basically, the whole idea of split-stack is antithetical to the QoI > > guarantees of musl. A program using split-stack can crash at any time > > due to out-of-memory, and there is no reliable/portable way to recover > > from this condition. It's much like the following low-quality aspects > > of glibc and default Linux config: > The same program may crash because of stack overflow (segfault) or > worst : corrupt memory. Only if written improperly. A correctly written program has bounded stack usage that's easily proven correct with static analysis. Unbounded stack usage is a bug, plain and simple, because there's no way to safely and portably handle the runtime error of running out of memory. > At best the split stack provides a way to increase the thread without > crashing the whole process. If you're comparing the behavior of a program with initial thread-stack size N and no-split-stack to a program with initial thread-stack size N that can also obtain additional stack space with split-stack, and you don't have static bounds on your stack usage that keep it below N, then I agree that the latter will succeed in cases where the former crashes. On the other hand, both programs WILL CRASH under appropriate conditions, and as such, they are both buggy programs. > At worst it crash the program but never corrupt the memory. Memory corruption will not happen without split stack either unless you turn off guard pages or use functions with huge stack frames without the -fstack-check option. > > As such, I'm willing to add whatever inexpensive support framework is > > needed so that people who want to use split-stack can use it, but I'm > > very wary of invasive or costly changes to support a feature which I > > believe is fundamentally misguided (and, for 64-bit targets, utterly > > useless). > > I understand. Getting into it more, I think split-stack is a lot harder to support than anybody has considered, especially if you want to still have a POSIX conforming environment. There are all sorts of nasty cases connected to signal handlers, async-signal-safety, async-cancel-safety, longjmp, and thread cancellation where I know at the very least you would need some ugly bloated hacks with unwinding to get them right, and where I'm doubtful you even _can_ make them 100% conforming. Getting this stuff right is highly non-trivial to begin with, even without split-stack (and glibc doesn't really even try) so I'm doubtful that the architects of split-stack even thought about it before throwing their experiment out there for everybody to use... Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-16 23:48 ` Rich Felker @ 2012-10-17 0:08 ` boris brezillon 2012-10-17 0:42 ` Rich Felker 0 siblings, 1 reply; 26+ messages in thread From: boris brezillon @ 2012-10-17 0:08 UTC (permalink / raw) To: musl 2012/10/17 Rich Felker <dalias@aerifal.cx>: > On Wed, Oct 17, 2012 at 01:39:49AM +0200, boris brezillon wrote: >> >> 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute >> >> to appropriate functions (at least all functions called before >> >> pthread_self_init because %gs or %fs register is unusable before this >> >> call). >> > >> > This is definitely not desirable, at least not by default. It hurts >> > performance, possibly a lot, and destroys async-signal-safety. Also I >> > doubt it's needed. As long as split stack mode leaves at least ~8k >> > when calling a new function, most if not all functions in musl should >> > run fine without needing support for enlarging the stack. >> I agree. This should be made optional. But if we don't compile libc >> with fsplit-stack (-fnosplit-stack). >> Each call to a libc func from an external func compiled with split >> stack may lead to a 64K stack chunk alloc. > > Where does this allocation take place from? There should simply be a > way to inhibit it. In the linker (gold linker). > >> >> 6) add no-split-stack note to every asm file. >> > >> > I'm against this, or any boilerplate clutter. If it's really needed, >> > it should be possible with CFLAGS (or "ASFLAGS"), rather than >> > modifying every file, and if there's no way to do it with command line >> > options, that's a bug in gas. >> Not supported in gas, already tried. > > That's frustrating.. > >> > Basically, the whole idea of split-stack is antithetical to the QoI >> > guarantees of musl. A program using split-stack can crash at any time >> > due to out-of-memory, and there is no reliable/portable way to recover >> > from this condition. It's much like the following low-quality aspects >> > of glibc and default Linux config: >> The same program may crash because of stack overflow (segfault) or >> worst : corrupt memory. > > Only if written improperly. A correctly written program has bounded > stack usage that's easily proven correct with static analysis. > Unbounded stack usage is a bug, plain and simple, because there's no > way to safely and portably handle the runtime error of running out of > memory. > >> At best the split stack provides a way to increase the thread without >> crashing the whole process. > > If you're comparing the behavior of a program with initial > thread-stack size N and no-split-stack to a program with initial > thread-stack size N that can also obtain additional stack space with > split-stack, and you don't have static bounds on your stack usage that > keep it below N, then I agree that the latter will succeed in cases > where the former crashes. On the other hand, both programs WILL CRASH > under appropriate conditions, and as such, they are both buggy > programs. > >> At worst it crash the program but never corrupt the memory. > > Memory corruption will not happen without split stack either unless > you turn off guard pages or use functions with huge stack frames > without the -fstack-check option. > >> > As such, I'm willing to add whatever inexpensive support framework is >> > needed so that people who want to use split-stack can use it, but I'm >> > very wary of invasive or costly changes to support a feature which I >> > believe is fundamentally misguided (and, for 64-bit targets, utterly >> > useless). >> >> I understand. > > Getting into it more, I think split-stack is a lot harder to support > than anybody has considered, especially if you want to still have a > POSIX conforming environment. There are all sorts of nasty cases > connected to signal handlers, async-signal-safety, > async-cancel-safety, longjmp, and thread cancellation where I know at > the very least you would need some ugly bloated hacks with unwinding > to get them right, and where I'm doubtful you even _can_ make them > 100% conforming. Getting this stuff right is highly non-trivial to > begin with, even without split-stack (and glibc doesn't really even > try) so I'm doubtful that the architects of split-stack even thought > about it before throwing their experiment out there for everybody to > use... > > Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-17 0:08 ` boris brezillon @ 2012-10-17 0:42 ` Rich Felker 2012-10-17 1:03 ` boris brezillon 2012-10-17 1:49 ` boris brezillon 0 siblings, 2 replies; 26+ messages in thread From: Rich Felker @ 2012-10-17 0:42 UTC (permalink / raw) To: musl On Wed, Oct 17, 2012 at 02:08:11AM +0200, boris brezillon wrote: > >> I agree. This should be made optional. But if we don't compile libc > >> with fsplit-stack (-fnosplit-stack). > >> Each call to a libc func from an external func compiled with split > >> stack may lead to a 64K stack chunk alloc. > > > > Where does this allocation take place from? There should simply be a > > way to inhibit it. > In the linker (gold linker). Well gold isn't running at runtime. I assume you mean it _arranges_ for this allocation to take place somehow, and that's what I'm wondering about whether there's a way to avoid. Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-17 0:42 ` Rich Felker @ 2012-10-17 1:03 ` boris brezillon 2012-10-17 1:49 ` boris brezillon 1 sibling, 0 replies; 26+ messages in thread From: boris brezillon @ 2012-10-17 1:03 UTC (permalink / raw) To: musl 2012/10/17 Rich Felker <dalias@aerifal.cx>: > On Wed, Oct 17, 2012 at 02:08:11AM +0200, boris brezillon wrote: >> >> I agree. This should be made optional. But if we don't compile libc >> >> with fsplit-stack (-fnosplit-stack). >> >> Each call to a libc func from an external func compiled with split >> >> stack may lead to a 64K stack chunk alloc. >> > >> > Where does this allocation take place from? There should simply be a >> > way to inhibit it. >> In the linker (gold linker). > > Well gold isn't running at runtime. I assume you mean it _arranges_ > for this allocation to take place somehow, and that's what I'm > wondering about whether there's a way to avoid. Sorry, this is done in __morestack_non_split (libgcc/config/i386/morestack.S). the linker replaces the __morestack call in the no_split_stack function's caller by __morestack_non_split. > > Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-17 0:42 ` Rich Felker 2012-10-17 1:03 ` boris brezillon @ 2012-10-17 1:49 ` boris brezillon 2012-10-17 1:58 ` Rich Felker 1 sibling, 1 reply; 26+ messages in thread From: boris brezillon @ 2012-10-17 1:49 UTC (permalink / raw) To: musl 2012/10/17 Rich Felker <dalias@aerifal.cx>: > On Wed, Oct 17, 2012 at 02:08:11AM +0200, boris brezillon wrote: >> >> I agree. This should be made optional. But if we don't compile libc >> >> with fsplit-stack (-fnosplit-stack). >> >> Each call to a libc func from an external func compiled with split >> >> stack may lead to a 64K stack chunk alloc. >> > >> > Where does this allocation take place from? There should simply be a >> > way to inhibit it. >> In the linker (gold linker). > > Well gold isn't running at runtime. I assume you mean it _arranges_ > for this allocation to take place somehow, and that's what I'm > wondering about whether there's a way to avoid. The easiest way to avoid big stack chunk allocation is to compile musl with -fno-split-stack option. This will not add any overhead to functions (no split stack prolog) And this will add a note to the shared object which tells the linker to avoid __morestack to __morestack_non_split replacement. > > Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-17 1:49 ` boris brezillon @ 2012-10-17 1:58 ` Rich Felker 2012-10-17 7:48 ` musl 0 siblings, 1 reply; 26+ messages in thread From: Rich Felker @ 2012-10-17 1:58 UTC (permalink / raw) To: musl On Wed, Oct 17, 2012 at 03:49:33AM +0200, boris brezillon wrote: > 2012/10/17 Rich Felker <dalias@aerifal.cx>: > > On Wed, Oct 17, 2012 at 02:08:11AM +0200, boris brezillon wrote: > >> >> I agree. This should be made optional. But if we don't compile libc > >> >> with fsplit-stack (-fnosplit-stack). > >> >> Each call to a libc func from an external func compiled with split > >> >> stack may lead to a 64K stack chunk alloc. > >> > > >> > Where does this allocation take place from? There should simply be a > >> > way to inhibit it. > >> In the linker (gold linker). > > > > Well gold isn't running at runtime. I assume you mean it _arranges_ > > for this allocation to take place somehow, and that's what I'm > > wondering about whether there's a way to avoid. > > The easiest way to avoid big stack chunk allocation is to compile musl > with -fno-split-stack option. > This will not add any overhead to functions (no split stack prolog) > And this will add a note to the shared object which tells the linker > to avoid __morestack to __morestack_non_split replacement. Where is this documented? The GCC manual doesn't mention anything about -fno-split-stack having special behavior like that, so for lack of any documentation otherwise, it "should" just be the option to turn off -fsplit-stack.. I'm not claiming you're wrong, just that this all seems poorly documented. Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-17 1:58 ` Rich Felker @ 2012-10-17 7:48 ` musl 0 siblings, 0 replies; 26+ messages in thread From: musl @ 2012-10-17 7:48 UTC (permalink / raw) To: musl On 17/10/2012 03:58, Rich Felker wrote: > On Wed, Oct 17, 2012 at 03:49:33AM +0200, boris brezillon wrote: >> 2012/10/17 Rich Felker <dalias@aerifal.cx>: >>> On Wed, Oct 17, 2012 at 02:08:11AM +0200, boris brezillon wrote: >>>>>> I agree. This should be made optional. But if we don't compile libc >>>>>> with fsplit-stack (-fnosplit-stack). >>>>>> Each call to a libc func from an external func compiled with split >>>>>> stack may lead to a 64K stack chunk alloc. >>>>> Where does this allocation take place from? There should simply be a >>>>> way to inhibit it. >>>> In the linker (gold linker). >>> Well gold isn't running at runtime. I assume you mean it _arranges_ >>> for this allocation to take place somehow, and that's what I'm >>> wondering about whether there's a way to avoid. >> The easiest way to avoid big stack chunk allocation is to compile musl >> with -fno-split-stack option. >> This will not add any overhead to functions (no split stack prolog) >> And this will add a note to the shared object which tells the linker >> to avoid __morestack to __morestack_non_split replacement. > Where is this documented? The GCC manual doesn't mention anything > about -fno-split-stack having special behavior like that, so for lack > of any documentation otherwise, it "should" just be the option to turn > off -fsplit-stack.. You're right, I misunderstood how -fno-split-stack was implemented. I tried to compile a source file with -fno-split-stack and didn't find any 'no-split-stack' note in the generated object file. When I compile it with -fsplit-stack both 'no-split-stack' and 'split-stack' notes are added. > > I'm not claiming you're wrong, just that this all seems poorly > documented. > > Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-16 21:47 ` boris brezillon 2012-10-16 22:09 ` Szabolcs Nagy 2012-10-16 22:54 ` Rich Felker @ 2012-10-19 18:39 ` orc 2012-10-19 18:41 ` Rich Felker 2 siblings, 1 reply; 26+ messages in thread From: orc @ 2012-10-19 18:39 UTC (permalink / raw) To: musl On Tue, 16 Oct 2012 23:47:52 +0200 boris brezillon <b.brezillon.musl@gmail.com> wrote: > 2012/10/16 boris brezillon <b.brezillon.musl@gmail.com>: > > Hi, > > > > First I'd like to thank Rich for adding TLS support (I started to > > work on it a few weeks ago but never had time to finish it). > > > > 2012/10/6 Daniel Cegiełka <daniel.cegielka@gmail.com>: > >> 2012/10/5 Rich Felker <dalias@aerifal.cx>: > >>> On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote: > >>>> great news! Finally able to compile Go (lang)... > >>> > >>> Did Go fail with gcc's emulated TLS in libgcc? > >> > >> I tested Go with sabotage (with fresh musl). I'll try to do it > >> again... gcc in sabotage was compiled without support for TLS, so > >> I didn't expect that it will be successful: > >> > >> https://github.com/rofl0r/sabotage/blob/master/pkg/gcc4 > >> > > There's at least one thing (maybe more) missing for go support with > > musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849 > > and http://gcc.gnu.org/wiki/SplitStacks). > > > > I'm also interested in split stack support in musl but for other > > reasons (thread and coroutine stack automatic expansion). > > > > For x86/x86_64 split stack is implemented using a field inside the > > pthread struct which is accessed via %fs (or %gs for x86_64) and an > > offset. > > > > Currently this offset is defined at 0x30 (0x70 for x86_64) by the > > TARGET_THREAD_SPLIT_STACK_OFFSET but only if > > TARGET_LIBC_PROVIDES_SSP is defined (see gcc/config/i386/gnu-user.h > > or gcc/config/i386/gnu-user64.h). > > > > As far as I know musl does not support stack protection, but we > > could at least patch gcc to define TARGET_THREAD_SPLIT_STACK_OFFSET > > when using musl. > > > > We also need to reserve a field in the musl pthread struct. There > > are currently two fields named 'unused1' and 'unused2' but I'm not > > sure they're really unused in every supported arch. > > > > > > BTW, I'd like to work on a more integrated support of split stack > > in MUSL : > > > > 1) support in dynamic linker (see the last point of > > http://gcc.gnu.org/wiki/SplitStacks) : check split stack notes in > > shared libs (and program ?) > > > > 2) support in thread implementation : currently when a thread is > > created the stack limit is set afterward (see > > https://github.com/mirrors/gcc/blob/master/libgcc/generic-morestack-thread.c > > and > > https://github.com/mirrors/gcc/blob/master/libgcc/config/i386/morestack.S) > > and the stack size is supposed to be 16K (which is the minimum > > stack size). This means we may reallocate a new stack chunk even if > > the previous one (the first one) is not fully used. If stack limit > > is set by thread implementation, this can be set appropriately > > according to the stack size defined by the thread creator. > > > > 3) more optimizations I haven't thought about yet... > > > 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute > to appropriate functions (at least all functions called before > pthread_self_init because %gs or %fs register is unusable before this > call). > > 5) set main thread stack limit to 0 (pthread_self_init) : the main > thread stack grow is handled by the kernel. > > 6) add no-split-stack note to every asm file. Why anything works only after putting a weak spikes that break after a slight touch? > > 7) make split stack support optional (either by checking the > -fsplit-stack option in CFLAGS or with a specific option : > --enable-split-stack) : split stack adds overhead to every functions > (except for those with the 'no_split_stack' attribute). > > > Do you have any concern about adding those features in musl ? > > > > Let me know if you see other issues I haven't noticed. > > > > > > Regards, > > > > Boris After reading whole thread I agree with Rich that this one is not only hard to implement, but completely useless. From other point of view: people expect from musl an easy to read and understand code, that not only works, but is easy to understand, modify, debug and build. Why extend it with features not even related to libc? (It is mostly a hack from gcc-binutils again?) Not only saying a word about people that use (or will use) other compilers and linkers. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-19 18:39 ` orc @ 2012-10-19 18:41 ` Rich Felker 0 siblings, 0 replies; 26+ messages in thread From: Rich Felker @ 2012-10-19 18:41 UTC (permalink / raw) To: musl On Sat, Oct 20, 2012 at 02:39:43AM +0800, orc wrote: > > 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute > > to appropriate functions (at least all functions called before > > pthread_self_init because %gs or %fs register is unusable before this > > call). > > > > 5) set main thread stack limit to 0 (pthread_self_init) : the main > > thread stack grow is handled by the kernel. > > > > 6) add no-split-stack note to every asm file. > Why anything works only after putting a weak spikes that break after a > slight touch? I don't follow what you're saying here. > > 7) make split stack support optional (either by checking the > > -fsplit-stack option in CFLAGS or with a specific option : > > --enable-split-stack) : split stack adds overhead to every functions > > (except for those with the 'no_split_stack' attribute). > > > > > Do you have any concern about adding those features in musl ? > > > > > > Let me know if you see other issues I haven't noticed. > > > > > > > > > Regards, > > > > > > Boris > > After reading whole thread I agree with Rich that this one is not only > hard to implement, but completely useless. From other point of view: I think it's hard (read: probably impossible) to implement in a way that's robust and correct, but it may not be too hard to implement the minimal support code so that folks who insist on using -fsplit-stack will not get pathologically bad behavior due to the calling code being unaware that is already has a plenty pre-allocated stack space to run on. > people expect from musl an easy to read and understand code, that not > only works, but is easy to understand, modify, debug and build. Why > extend it with features not even related to libc? (It is mostly a hack > from gcc-binutils again?) I agree. I definitely don't want to compromise on correctness/robustness for the sake of this, and I'd also like to avoid adding complexity or maintenance burdens. Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-04 21:13 TLS (thread-local storage) support Rich Felker 2012-10-04 21:29 ` Daniel Cegiełka @ 2012-10-05 3:04 ` Rich Felker 2012-10-05 17:27 ` Rich Felker 1 sibling, 1 reply; 26+ messages in thread From: Rich Felker @ 2012-10-05 3:04 UTC (permalink / raw) To: musl On Thu, Oct 04, 2012 at 05:13:32PM -0400, Rich Felker wrote: > Hi, > > I've committed the initial version of thread-local storage > (__thread/_Thread_local keyword). So far, it only works in > static-linked applications, Scratch that. It's now supported everywhere except dynamically loaded (dlopen'd) shared libraries. And I'm working on adding support for them too. So far only i386 is tested, but at least x86_64 is also very likely to work (it's basically the same). > and might or might not be working properly > on arm, mips, and microblaze. I believe it's working on ARM, but it's completely untested. Microblaze and MIPS do not yet have the necessary relocation processing, but TLS in the main executable (static or dynamic linked) _might_ work. > The latter is a matter of whether these > archs need "TLS variant I" instead of the much cleaner/saner "variant > II" used by i386 and x86_64; So far, I can't see anywhere the variant is relevant to the ABI; it seems we can just use "variant II" unconditionally. Let's hope I'm right because I don't feel like dealing with more ugly, gratuitous special-case code. Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-05 3:04 ` Rich Felker @ 2012-10-05 17:27 ` Rich Felker 2012-10-06 14:33 ` Szabolcs Nagy 0 siblings, 1 reply; 26+ messages in thread From: Rich Felker @ 2012-10-05 17:27 UTC (permalink / raw) To: musl On Thu, Oct 04, 2012 at 11:04:14PM -0400, Rich Felker wrote: > On Thu, Oct 04, 2012 at 05:13:32PM -0400, Rich Felker wrote: > > Hi, > > > > I've committed the initial version of thread-local storage > > (__thread/_Thread_local keyword). So far, it only works in > > static-linked applications, > > Scratch that. It's now supported everywhere except dynamically loaded > (dlopen'd) shared libraries. And I'm working on adding support for And they're working now too. I've also made some general fixes and improvements to the dynamic linker -- minor corrections in how library files are located, and support for recursive calls to dlopen (happens when a library has constructors and one of those constructors calls dlopen). This same change was also necessary to avoid blocking pthread_create calls for the entire duration of constructor execution. Some further dynamic linker development directions: - Unifying the relocation code in arch/$(ARCH)/reloc.h to minimize duplication. - Adding dlsym() support for TLS vars (obtaining current thread's copy). - Cleanup and reduction of code duplication - phdr parsing and symbol lookup logic is duplicated in several places. And of course testing TLS on other archs and fixing anything that's broken... Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-05 17:27 ` Rich Felker @ 2012-10-06 14:33 ` Szabolcs Nagy 2012-10-06 20:39 ` Szabolcs Nagy 0 siblings, 1 reply; 26+ messages in thread From: Szabolcs Nagy @ 2012-10-06 14:33 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 522 bytes --] * Rich Felker <dalias@aerifal.cx> [2012-10-05 13:27:29 -0400]: > On Thu, Oct 04, 2012 at 11:04:14PM -0400, Rich Felker wrote: > > Scratch that. It's now supported everywhere except dynamically loaded > > (dlopen'd) shared libraries. And I'm working on adding support for > > And they're working now too. > should the attached code work with dlopen when compiled as a dso? (i wanted to check if the alignments are ok after a dlopen, but i can see how this usage may not be supported) it seems it dies here in the ctor [-- Attachment #2: tls.c --] [-- Type: text/x-csrc, Size: 574 bytes --] #include <stddef.h> __thread char c1 = 1; __thread char xchar = 2; __thread char c2 = 3; __thread short xshort = 4; __thread char c3 = 5; __thread int xint = 6; __thread char c4 = 7; __thread long long xllong = 8; struct { char *name; size_t size; size_t align; size_t addr; } t[4]; #define entry(i,x) \ t[i].name = #x; \ t[i].size = sizeof x; \ t[i].align = __alignof__(x); \ t[i].addr = (size_t)&x; __attribute__((constructor)) static void init(void) { entry(0, xchar) entry(1, xshort) entry(2, xint) entry(3, xllong) } ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-06 14:33 ` Szabolcs Nagy @ 2012-10-06 20:39 ` Szabolcs Nagy 2012-10-06 20:58 ` Rich Felker 0 siblings, 1 reply; 26+ messages in thread From: Szabolcs Nagy @ 2012-10-06 20:39 UTC (permalink / raw) To: musl * Szabolcs Nagy <nsz@port70.net> [2012-10-06 16:33:01 +0200]: > should the attached code work with dlopen > when compiled as a dso? > > (i wanted to check if the alignments are ok after a dlopen, > but i can see how this usage may not be supported) > > it seems it dies here in the ctor a more minimal example: a.c: __thread int xx; int *p; __attribute__((constructor)) static void init(void) { p = &xx; } b.c: #include <dlfcn.h> void *h; int main() { h = dlopen("./a.so", RTLD_LAZY); } compiled as musl-gcc -shared -fPIC -g -o a.so a.c musl-gcc -g -o b b.c ./b segfaults in init at p=&xx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: TLS (thread-local storage) support 2012-10-06 20:39 ` Szabolcs Nagy @ 2012-10-06 20:58 ` Rich Felker 0 siblings, 0 replies; 26+ messages in thread From: Rich Felker @ 2012-10-06 20:58 UTC (permalink / raw) To: musl On Sat, Oct 06, 2012 at 10:39:39PM +0200, Szabolcs Nagy wrote: > * Szabolcs Nagy <nsz@port70.net> [2012-10-06 16:33:01 +0200]: > > should the attached code work with dlopen > > when compiled as a dso? > > > > (i wanted to check if the alignments are ok after a dlopen, > > but i can see how this usage may not be supported) > > > > it seems it dies here in the ctor > > a more minimal example: > > a.c: > __thread int xx; > int *p; > __attribute__((constructor)) static void init(void) > { > p = &xx; > } > > b.c: > #include <dlfcn.h> > void *h; > int main() > { > h = dlopen("./a.so", RTLD_LAZY); > } > > compiled as > musl-gcc -shared -fPIC -g -o a.so a.c > musl-gcc -g -o b b.c > > ../b segfaults in init at p=&xx Very stupid issue, fixed by commit 92e1cd9b0ba9a8fa86e0346b121e159fb88f99bc: http://git.musl-libc.org/cgit/musl/commit/?id=92e1cd9b0ba9a8fa86e0346b121e159fb88f99bc Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2012-10-19 18:41 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-10-04 21:13 TLS (thread-local storage) support Rich Felker 2012-10-04 21:29 ` Daniel Cegiełka 2012-10-04 22:36 ` Rich Felker 2012-10-06 8:17 ` Daniel Cegiełka 2012-10-16 21:27 ` boris brezillon 2012-10-16 21:47 ` boris brezillon 2012-10-16 22:09 ` Szabolcs Nagy 2012-10-16 23:16 ` boris brezillon 2012-10-17 10:37 ` Szabolcs Nagy 2012-10-16 23:29 ` Rich Felker 2012-10-16 22:54 ` Rich Felker 2012-10-16 23:39 ` boris brezillon 2012-10-16 23:48 ` Rich Felker 2012-10-17 0:08 ` boris brezillon 2012-10-17 0:42 ` Rich Felker 2012-10-17 1:03 ` boris brezillon 2012-10-17 1:49 ` boris brezillon 2012-10-17 1:58 ` Rich Felker 2012-10-17 7:48 ` musl 2012-10-19 18:39 ` orc 2012-10-19 18:41 ` Rich Felker 2012-10-05 3:04 ` Rich Felker 2012-10-05 17:27 ` Rich Felker 2012-10-06 14:33 ` Szabolcs Nagy 2012-10-06 20:39 ` Szabolcs Nagy 2012-10-06 20:58 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).