mailing list of musl libc
 help / color / mirror / code / Atom feed
* TLS storage offsets for TLS_ABOVE_TP
@ 2018-02-09 18:07 Nicholas Wilson
  2018-02-09 20:03 ` Rich Felker
  0 siblings, 1 reply; 3+ messages in thread
From: Nicholas Wilson @ 2018-02-09 18:07 UTC (permalink / raw)
  To: musl

Hi,

I have a question about the support for TLS_ABOVE_TP ("TLS variant I").

In archs like ARM, we have a matched pair of functions TP_ADJ and __pthread_self, which adjust a pthread* to the thread-register and back again. For ARM, there's an additional offset of 8 (and 16 for AArch64), which is part of the ABI to ensure a) the TP points to the DTV, and b) the main module's TLS block is at a known offset from the TP.

However, the ARM adjustment code uses "TP = pthread* + sizeof(pthread) - 8". That's correct for the arch ABI, in that the linker requires that the thread storage be located 8 bytes above the TP, and Musl does indeed store the TLS block there right after the struct pthread.

But what's odd is that you have the pthread->dtv_copy member right at the end of the pthread struct. So the thread pointer is pointing not to pthread->dtv_copy, but actually to pthread->canary_at_end.

Is my mental compiler going wrong? I don't have an ARM machine to actually execute the code on, but I've just been staring at it for an hour and can't work it out.

One thing I have noticed is that in all the four TLS models (general dynamic, local dynamic, initial exec, local exec), the DTV isn't actually dereferenced directly by compiler-generated code. The whole thing about having the DTV available to the compiler in a known location in fact is not used! So maybe that's how you get away with.

For the sake of "correctness" and conformance though, I wonder if there should be a final "void *dtv_pad" member at the end of struct pthread, so that the DTV block at the end of the struct pthread has the right size for the platform.

I'd be happy to hear I'm wrong! (Maybe a diagram in the source code would help comprehension, to show how the memory is laid out, including the alignment bits and pieces. The FreeBSD libc source code has a helpful bit of ASCII art that draws the layout of the various bits of data and the alignment blocks between them.)

All the best,
Nick

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: TLS storage offsets for TLS_ABOVE_TP
  2018-02-09 18:07 TLS storage offsets for TLS_ABOVE_TP Nicholas Wilson
@ 2018-02-09 20:03 ` Rich Felker
  2018-02-09 20:39   ` Nicholas Wilson
  0 siblings, 1 reply; 3+ messages in thread
From: Rich Felker @ 2018-02-09 20:03 UTC (permalink / raw)
  To: musl

On Fri, Feb 09, 2018 at 06:07:25PM +0000, Nicholas Wilson wrote:
> Hi,
> 
> I have a question about the support for TLS_ABOVE_TP ("TLS variant
> I").
> 
> In archs like ARM, we have a matched pair of functions TP_ADJ and
> __pthread_self, which adjust a pthread* to the thread-register and
> back again. For ARM, there's an additional offset of 8 (and 16 for
> AArch64), which is part of the ABI to ensure a) the TP points to the
> DTV, and b) the main module's TLS block is at a known offset from
> the TP.
> 
> However, the ARM adjustment code uses "TP = pthread* +
> sizeof(pthread) - 8". That's correct for the arch ABI, in that the
> linker requires that the thread storage be located 8 bytes above the
> TP, and Musl does indeed store the TLS block there right after the
> struct pthread.
> 
> But what's odd is that you have the pthread->dtv_copy member right
> at the end of the pthread struct. So the thread pointer is pointing
> not to pthread->dtv_copy, but actually to pthread->canary_at_end.

dtv_copy at the end is just for internal use by ASM that needs to be
able to find the dtv pointer. I don't think there's any code on ARM
that uses it, but some archs might.

> Is my mental compiler going wrong? I don't have an ARM machine to
> actually execute the code on, but I've just been staring at it for
> an hour and can't work it out.
> 
> One thing I have noticed is that in all the four TLS models (general
> dynamic, local dynamic, initial exec, local exec), the DTV isn't
> actually dereferenced directly by compiler-generated code. The whole
> thing about having the DTV available to the compiler in a known
> location in fact is not used! So maybe that's how you get away with.

This is exactly right. There is no valid way the compiler can generate
code to access the DTV because it doesn't know the generation counter
to compare against (on glibc). Drepper's TLS ABI document muddled a
lot of things like this which are actually implementation details
because he was documenting his implementation rather than the
compiler-linker, linker-ldso, etc. ABI boundaries. This was an
important observation at the time I implemented TLS in musl. Since
access to the DTV is not actually ABI, its form is not fixed but an
implementation detail, and we used a form that omits the glibc
generation counters since we don't unload modules.

> For the sake of "correctness" and conformance though, I wonder if
> there should be a final "void *dtv_pad" member at the end of struct
> pthread, so that the DTV block at the end of the struct pthread has
> the right size for the platform.

No, this would shift canary_at_end back, breaking the ABI for it --
and the canary-at-end *is* ABI on the archs that use it. (If in the
future there are multiple incompatible places the canary could be
across different archs, we'll have to adjust this section with
preprocessor conditionals.) Alternatively we might be able to adjust
the shift of struct __pthread relative to the TP per-arch.

> I'd be happy to hear I'm wrong! (Maybe a diagram in the source code
> would help comprehension, to show how the memory is laid out,
> including the alignment bits and pieces. The FreeBSD libc source
> code has a helpful bit of ASCII art that draws the layout of the
> various bits of data and the alignment blocks between them.)

It's not so much that you're wrong as that the TLS document is
misleading.

Rich


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: TLS storage offsets for TLS_ABOVE_TP
  2018-02-09 20:03 ` Rich Felker
@ 2018-02-09 20:39   ` Nicholas Wilson
  0 siblings, 0 replies; 3+ messages in thread
From: Nicholas Wilson @ 2018-02-09 20:39 UTC (permalink / raw)
  To: musl

Hi Rich,

Thanks as ever for your helpful response.

It sounds like these things need to written up somewhere; if there isn't a document maybe some code comments would help in the future, if someone else is confused (or trying like me to port to a new arch).

So on a new arch, we can be free to put the TP wherever we want with respect to the TLS block, as long as the linker and Musl agree on the distance between them. I'll pick 2*sizeof(void*) for Wasm to match Arm/AArch64 (I prefer "variant 1" to "variant 2" given the generated code sequence is simpler for the local exec model that Wasm will use).

It's a shame that the glibc spec is so out of date, perhaps a Musl wiki page would be helpful as a reference too?

All the best,
Nick

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-02-09 20:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-09 18:07 TLS storage offsets for TLS_ABOVE_TP Nicholas Wilson
2018-02-09 20:03 ` Rich Felker
2018-02-09 20:39   ` Nicholas Wilson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).