From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13352 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: TLSDESC register-preserving mess Date: Tue, 9 Oct 2018 22:35:46 -0400 Message-ID: <20181010023546.GM17110@brightrain.aerifal.cx> References: <20181010012620.GL17110@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1539138835 13693 195.159.176.226 (10 Oct 2018 02:33:55 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 10 Oct 2018 02:33:55 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-13368-gllmg-musl=m.gmane.org@lists.openwall.com Wed Oct 10 04:33:51 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1gA4Jq-0003Si-TV for gllmg-musl@m.gmane.org; Wed, 10 Oct 2018 04:33:50 +0200 Original-Received: (qmail 5791 invoked by uid 550); 10 Oct 2018 02:35:59 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 5767 invoked from network); 10 Oct 2018 02:35:59 -0000 Content-Disposition: inline In-Reply-To: <20181010012620.GL17110@brightrain.aerifal.cx> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:13352 Archived-At: On Tue, Oct 09, 2018 at 09:26:20PM -0400, Rich Felker wrote: > I've run across a bit of a problem in how the TLSDESC calling > conventions work. In the case where the needed DTV slot is not yet > filled in for the calling thread, the dynamic TLSDESC function needs > to call into C code that obtains the memory that was previously > reserved for it, initializes it (involving memcpy/memset), and fills > in the DTV entry for it. This requires saving and restoring any > call-clobbered registers that might be used by C code. > > Because the operation involves memcpy/memset, it's not just > theoretically possible but likely that vector registers could be used. > As written, the aarch64 and arm asm save and restore float/vector > registers around the call, but I don't think they're future-proof > against ISA extensions that add more such registers; if libc were > built to use such a future ISA level, the asm we have now would be > unsafe. The i386 and x86_64 tlsdesc asm do not presently do anything > to save float/vector registers, and doing so would involve lots of > hwcap mess to figure out which ones are present. I think it would also > fail to be future-proof. Fortunately, i386 and x86_64 both provide > non-vector asm implementations of memcpy and memset, making it less > likely that any vector registers would be used in these code paths, > but still not impossible. It's also a hidden constraint, that things > only work because of the asm implementation details. > > Unfortunately making a future-proof solution is really hard; this is a > consequence of the TLSDESC ABI and the way register file extensions > get done by cpu vendors. > > One approach would be generating a fully-flattened version of > __tls_get_new for each arch that uses TLSDESC, via gcc -S, and > committing the output into the project as a source file. > Unfortunately, this involves atomic whose definitions vary by ISA > level on arm, so I think that makes it a no-go. Obviously it's also > really ugly. > > Another approach is to depend on the compiler having flags that can be > used to build for a profile that only allows GPRs (no vector regs, > etc.), and building __tls_get_new as its own source file using these > flags. This is not the sort of tooling requirement I like, since it > abandons the principle of working with an arbitrary compiler with > minimal GNU C features. > > The only approach I know that doesn't involve any tooling is having > the dynamic TLSDESC function raise a signal when it's missing the DTV > slot it needs. This delegates the responsibility for awareness of what > registers need saving to the kernel, which already must be aware in > order to perform context switching (you inherently can't run a binary > that uses new registers on an old kernel that's not aware of them). > This approach is nice in that it's entirely arch-agnostic, and works > for all present and future archs and ISA/register-file extensions. The > easy approach would just nab another SIGRTx as an > implementation-internal signal, so that all the asm would need to do > is a tkill syscall. Multiplexing on another signal should be possible > but makes for more complexity and I'm not sure there's any real > benefit. > > My leaning is to go with the signal solution. An alternate approach being proposed on #musl that I might like better is getting rid of __tls_get_new entirely, having the DTV for all existing threads updated at dlopen time. This requires either a __synccall with no failure path (which we don't have) or adding a linked list of threads. The non-__synccall approach also requires the SYS_membarrier syscall (Linux 4.3) and emulation of it as a fallback (which can be done via signals if you have a list of threads). Aside from solving the tlsdesc clobber issue, what I like about this approach is that it removes all branches from __tls_get_addr and the dynamic tlsdesc function; they just *always succeed in the hot path*. It also makes it easier to facilitate recovery of memory allocated for dynamic TLS if we want to -- it no longer has to be a shared block doled out to threads via a_fetch_add, so each thread could get its own malloc and then be able to free it at exit. Rich