From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13351 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: TLSDESC register-preserving mess Date: Tue, 9 Oct 2018 21:26:20 -0400 Message-ID: <20181010012620.GL17110@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1539134668 20758 195.159.176.226 (10 Oct 2018 01:24:28 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 10 Oct 2018 01:24:28 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-13367-gllmg-musl=m.gmane.org@lists.openwall.com Wed Oct 10 03:24:24 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1gA3Ee-0005IS-D5 for gllmg-musl@m.gmane.org; Wed, 10 Oct 2018 03:24:24 +0200 Original-Received: (qmail 13680 invoked by uid 550); 10 Oct 2018 01:26:33 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 13614 invoked from network); 10 Oct 2018 01:26:32 -0000 Content-Disposition: inline Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:13351 Archived-At: I've run across a bit of a problem in how the TLSDESC calling conventions work. In the case where the needed DTV slot is not yet filled in for the calling thread, the dynamic TLSDESC function needs to call into C code that obtains the memory that was previously reserved for it, initializes it (involving memcpy/memset), and fills in the DTV entry for it. This requires saving and restoring any call-clobbered registers that might be used by C code. Because the operation involves memcpy/memset, it's not just theoretically possible but likely that vector registers could be used. As written, the aarch64 and arm asm save and restore float/vector registers around the call, but I don't think they're future-proof against ISA extensions that add more such registers; if libc were built to use such a future ISA level, the asm we have now would be unsafe. The i386 and x86_64 tlsdesc asm do not presently do anything to save float/vector registers, and doing so would involve lots of hwcap mess to figure out which ones are present. I think it would also fail to be future-proof. Fortunately, i386 and x86_64 both provide non-vector asm implementations of memcpy and memset, making it less likely that any vector registers would be used in these code paths, but still not impossible. It's also a hidden constraint, that things only work because of the asm implementation details. Unfortunately making a future-proof solution is really hard; this is a consequence of the TLSDESC ABI and the way register file extensions get done by cpu vendors. One approach would be generating a fully-flattened version of __tls_get_new for each arch that uses TLSDESC, via gcc -S, and committing the output into the project as a source file. Unfortunately, this involves atomic whose definitions vary by ISA level on arm, so I think that makes it a no-go. Obviously it's also really ugly. Another approach is to depend on the compiler having flags that can be used to build for a profile that only allows GPRs (no vector regs, etc.), and building __tls_get_new as its own source file using these flags. This is not the sort of tooling requirement I like, since it abandons the principle of working with an arbitrary compiler with minimal GNU C features. The only approach I know that doesn't involve any tooling is having the dynamic TLSDESC function raise a signal when it's missing the DTV slot it needs. This delegates the responsibility for awareness of what registers need saving to the kernel, which already must be aware in order to perform context switching (you inherently can't run a binary that uses new registers on an old kernel that's not aware of them). This approach is nice in that it's entirely arch-agnostic, and works for all present and future archs and ISA/register-file extensions. The easy approach would just nab another SIGRTx as an implementation-internal signal, so that all the asm would need to do is a tkill syscall. Multiplexing on another signal should be possible but makes for more complexity and I'm not sure there's any real benefit. My leaning is to go with the signal solution. Rich