From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/14040 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Does TD point to itself intentionally? Date: Sat, 30 Mar 2019 10:39:39 -0400 Message-ID: <20190330143939.GI23599@brightrain.aerifal.cx> References: <20190330103814.GB18043@voyager> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="44009"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-14056-gllmg-musl=m.gmane.org@lists.openwall.com Sat Mar 30 15:39:56 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1hAF9G-000BJ6-Ou for gllmg-musl@m.gmane.org; Sat, 30 Mar 2019 15:39:54 +0100 Original-Received: (qmail 15830 invoked by uid 550); 30 Mar 2019 14:39:52 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 15812 invoked from network); 30 Mar 2019 14:39:51 -0000 Content-Disposition: inline In-Reply-To: <20190330103814.GB18043@voyager> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:14040 Archived-At: On Sat, Mar 30, 2019 at 11:38:14AM +0100, Markus Wichmann wrote: > Hi all, > > I was looking over my old C experiments and saw an old file, trying to > use clang's address_space attribute to access something like a thread > pointer. That made me wonder how it is implemented in musl. I've experimented with using the equivalent in GCC to get musl to generate %gs:offset or %fs:offset for access to fields in the thread structure. Unfortunately you need -fasm or they silently don't work -- see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87626 for details. It does help code generation somewhat and gave measurable performance benefits in microbenchmarks (mainly due to reducing register pressure), but would require making separate __self() or something that returns the address-spaced pointer whose value is not valid for assignment to pointers or passing as an argment like __pthread_self() needs to be. Also, experiments showed that GCC generated multiple instances of __self() on archs where the asm to load the thread pointer was actually more expensive than caching the result in a register. This was able to be partly mitigated by adding some \n\n\n to the asm... *facepalm* > In most architectures, the thread pointer is just stored in a register, > and __pthread_self() will just grab it out of there. For x86_64, > something slightly similar happens: The thread pointer is stored in > FS.base, which is an MSR the kernel has to set for us, but we can read > it with FS-relative addressing. > > Incidentally: Is there any interest in using the "wrfsbase" instruction > for that, where available? From a cursory first glance, it looks like > that would mean that musl would have to do the entire CPUID dance on > AMD64 and i386, and in the latter case the dance would be a bit longer > since the ID bit dance would have to preceed it. No. Even a single insn to test the stored result of whether such a feature is available (in practice it would take several and a branch) is more expensive than loading from %fs:0. And even without having to make a runtime test, it should be the same cost, possibly still more expensive, than loading from %fs:0. > Back to setting the thread pointer: The relevant code is in __init_tp(), > which is always called with the return value from __copy_tls(), which > points to the new thread descriptor. __init_tp() will then call > __set_thread_area() with the adjusted thread pointer, and on AMD64, this > will just call arch_prctl(SET_FS, p). Though I don't know why that > function has to be in assembly. > > OK, got it. After this, FS.base will point directly at the TD, so we can > just load FS.base into any register and have a thread pointer, right? > Enter __pthread_self(): > > static inline struct pthread *__pthread_self() > { > struct pthread *self; > __asm__ ("mov %%fs:0,%0" : "=r" (self) ); > return self; > } > > But that is not the same thing! This will load FS.base, and then > dereference it and load the qword it is pointing at into a register. So > how did this ever work? Well, the answer is back in __init_tp(): > > td->self = td; > > And of course, "self" is the first member of struct pthread. > > So, now the question I've been building up to: Is that intentional? Is Yes, this is intentional. It's the documented ABI for x86[_64], and necessary for the operation of code generated by a compiler conforming to the ABI that takes &tlsvar via the initial-exec or local-exec model. > there a reason for there to be a pointer pointing to itself, other than > the "mov" in __pthread_self()? Could that mov not be replaced with a > "lea" and save one useless memory access? The effective address computed by lea would be relative to %fs or %gs. It's not useful. Rich