mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: TLSDESC register-preserving mess
Date: Tue, 9 Oct 2018 22:35:46 -0400	[thread overview]
Message-ID: <20181010023546.GM17110@brightrain.aerifal.cx> (raw)
In-Reply-To: <20181010012620.GL17110@brightrain.aerifal.cx>

On Tue, Oct 09, 2018 at 09:26:20PM -0400, Rich Felker wrote:
> I've run across a bit of a problem in how the TLSDESC calling
> conventions work. In the case where the needed DTV slot is not yet
> filled in for the calling thread, the dynamic TLSDESC function needs
> to call into C code that obtains the memory that was previously
> reserved for it, initializes it (involving memcpy/memset), and fills
> in the DTV entry for it. This requires saving and restoring any
> call-clobbered registers that might be used by C code.
> 
> Because the operation involves memcpy/memset, it's not just
> theoretically possible but likely that vector registers could be used.
> As written, the aarch64 and arm asm save and restore float/vector
> registers around the call, but I don't think they're future-proof
> against ISA extensions that add more such registers; if libc were
> built to use such a future ISA level, the asm we have now would be
> unsafe. The i386 and x86_64 tlsdesc asm do not presently do anything
> to save float/vector registers, and doing so would involve lots of
> hwcap mess to figure out which ones are present. I think it would also
> fail to be future-proof. Fortunately, i386 and x86_64 both provide
> non-vector asm implementations of memcpy and memset, making it less
> likely that any vector registers would be used in these code paths,
> but still not impossible. It's also a hidden constraint, that things
> only work because of the asm implementation details.
> 
> Unfortunately making a future-proof solution is really hard; this is a
> consequence of the TLSDESC ABI and the way register file extensions
> get done by cpu vendors.
> 
> One approach would be generating a fully-flattened version of
> __tls_get_new for each arch that uses TLSDESC, via gcc -S, and
> committing the output into the project as a source file.
> Unfortunately, this involves atomic whose definitions vary by ISA
> level on arm, so I think that makes it a no-go. Obviously it's also
> really ugly.
> 
> Another approach is to depend on the compiler having flags that can be
> used to build for a profile that only allows GPRs (no vector regs,
> etc.), and building __tls_get_new as its own source file using these
> flags. This is not the sort of tooling requirement I like, since it
> abandons the principle of working with an arbitrary compiler with
> minimal GNU C features.
> 
> The only approach I know that doesn't involve any tooling is having
> the dynamic TLSDESC function raise a signal when it's missing the DTV
> slot it needs. This delegates the responsibility for awareness of what
> registers need saving to the kernel, which already must be aware in
> order to perform context switching (you inherently can't run a binary
> that uses new registers on an old kernel that's not aware of them).
> This approach is nice in that it's entirely arch-agnostic, and works
> for all present and future archs and ISA/register-file extensions. The
> easy approach would just nab another SIGRTx as an
> implementation-internal signal, so that all the asm would need to do
> is a tkill syscall. Multiplexing on another signal should be possible
> but makes for more complexity and I'm not sure there's any real
> benefit.
> 
> My leaning is to go with the signal solution.

An alternate approach being proposed on #musl that I might like better
is getting rid of __tls_get_new entirely, having the DTV for all
existing threads updated at dlopen time. This requires either a
__synccall with no failure path (which we don't have) or adding a
linked list of threads. The non-__synccall approach also requires the
SYS_membarrier syscall (Linux 4.3) and emulation of it as a fallback
(which can be done via signals if you have a list of threads).

Aside from solving the tlsdesc clobber issue, what I like about this
approach is that it removes all branches from __tls_get_addr and the
dynamic tlsdesc function; they just *always succeed in the hot path*.
It also makes it easier to facilitate recovery of memory allocated for
dynamic TLS if we want to -- it no longer has to be a shared block
doled out to threads via a_fetch_add, so each thread could get its own
malloc and then be able to free it at exit.

Rich


  reply	other threads:[~2018-10-10  2:35 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-10  1:26 Rich Felker
2018-10-10  2:35 ` Rich Felker [this message]
2018-10-10 13:19 ` Szabolcs Nagy
2018-10-10 13:52   ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181010023546.GM17110@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).