From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/14040
Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: Does TD point to itself intentionally?
Date: Sat, 30 Mar 2019 10:39:39 -0400
Message-ID: <20190330143939.GI23599@brightrain.aerifal.cx>
References: <20190330103814.GB18043@voyager>
Reply-To: musl@lists.openwall.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226";
	logging-data="44009"; mail-complaints-to="usenet@blaine.gmane.org"
User-Agent: Mutt/1.5.21 (2010-09-15)
To: musl@lists.openwall.com
Original-X-From: musl-return-14056-gllmg-musl=m.gmane.org@lists.openwall.com Sat Mar 30 15:39:56 2019
Return-path: <musl-return-14056-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.89)
	(envelope-from <musl-return-14056-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1hAF9G-000BJ6-Ou
	for gllmg-musl@m.gmane.org; Sat, 30 Mar 2019 15:39:54 +0100
Original-Received: (qmail 15830 invoked by uid 550); 30 Mar 2019 14:39:52 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 15812 invoked from network); 30 Mar 2019 14:39:51 -0000
Content-Disposition: inline
In-Reply-To: <20190330103814.GB18043@voyager>
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:14040
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/14040>

On Sat, Mar 30, 2019 at 11:38:14AM +0100, Markus Wichmann wrote:
> Hi all,
> 
> I was looking over my old C experiments and saw an old file, trying to
> use clang's address_space attribute to access something like a thread
> pointer. That made me wonder how it is implemented in musl.

I've experimented with using the equivalent in GCC to get musl to
generate %gs:offset or %fs:offset for access to fields in the thread
structure. Unfortunately you need -fasm or they silently don't work --
see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87626 for details. It
does help code generation somewhat and gave measurable performance
benefits in microbenchmarks (mainly due to reducing register
pressure), but would require making separate __self() or something
that returns the address-spaced pointer whose value is not valid for
assignment to pointers or passing as an argment like __pthread_self()
needs to be. Also, experiments showed that GCC generated multiple
instances of __self() on archs where the asm to load the thread
pointer was actually more expensive than caching the result in a
register. This was able to be partly mitigated by adding some \n\n\n
to the asm... *facepalm*

> In most architectures, the thread pointer is just stored in a register,
> and __pthread_self() will just grab it out of there. For x86_64,
> something slightly similar happens: The thread pointer is stored in
> FS.base, which is an MSR the kernel has to set for us, but we can read
> it with FS-relative addressing.
> 
> Incidentally: Is there any interest in using the "wrfsbase" instruction
> for that, where available? From a cursory first glance, it looks like
> that would mean that musl would have to do the entire CPUID dance on
> AMD64 and i386, and in the latter case the dance would be a bit longer
> since the ID bit dance would have to preceed it.

No. Even a single insn to test the stored result of whether such a
feature is available (in practice it would take several and a branch)
is more expensive than loading from %fs:0. And even without having to
make a runtime test, it should be the same cost, possibly still more
expensive, than loading from %fs:0.

> Back to setting the thread pointer: The relevant code is in __init_tp(),
> which is always called with the return value from __copy_tls(), which
> points to the new thread descriptor. __init_tp() will then call
> __set_thread_area() with the adjusted thread pointer, and on AMD64, this
> will just call arch_prctl(SET_FS, p). Though I don't know why that
> function has to be in assembly.
> 
> OK, got it. After this, FS.base will point directly at the TD, so we can
> just load FS.base into any register and have a thread pointer, right?
> Enter __pthread_self():
> 
> static inline struct pthread *__pthread_self()
> {
> 	struct pthread *self;
> 	__asm__ ("mov %%fs:0,%0" : "=r" (self) );
> 	return self;
> }
> 
> But that is not the same thing! This will load FS.base, and then
> dereference it and load the qword it is pointing at into a register. So
> how did this ever work? Well, the answer is back in __init_tp():
> 
> 	td->self = td;
> 
> And of course, "self" is the first member of struct pthread.
> 
> So, now the question I've been building up to: Is that intentional? Is

Yes, this is intentional. It's the documented ABI for x86[_64], and
necessary for the operation of code generated by a compiler
conforming to the ABI that takes &tlsvar via the initial-exec or
local-exec model.

> there a reason for there to be a pointer pointing to itself, other than
> the "mov" in __pthread_self()? Could that mov not be replaced with a
> "lea" and save one useless memory access?

The effective address computed by lea would be relative to %fs or %gs.
It's not useful.

Rich