From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/14066 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Thread-local memory for thread structures Date: Thu, 11 Apr 2019 11:09:59 -0400 Message-ID: <20190411150959.GW23599@brightrain.aerifal.cx> References: Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="133321"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-14082-gllmg-musl=m.gmane.org@lists.openwall.com Thu Apr 11 17:10:21 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1hEbLI-000YVm-G0 for gllmg-musl@m.gmane.org; Thu, 11 Apr 2019 17:10:20 +0200 Original-Received: (qmail 20049 invoked by uid 550); 11 Apr 2019 15:10:13 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 20031 invoked from network); 11 Apr 2019 15:10:12 -0000 Content-Disposition: inline In-Reply-To: Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:14066 Archived-At: On Thu, Apr 11, 2019 at 12:12:46PM +0100, Raphael Cohn wrote: > Dear List, > > I'm playing around with allocating 100s of bytes of TLS memory for > various purposes. Something I noticed in the code for creating the > mmap'd memory for TLS is that it does not (quite reasonably) assign it > a NUMA memory policy. > > I'd like to assign a NUMA memory policy to the memory used for > managing a thread. Is there anything 'underhanded' I can do to find > out its location and size? I realize anything is likely to be brittle. > Ideally what I'd like is a 'set NUMA memory policy of this thead's > mmap'd management memory to the local NUMA node [once I've scheduled > it to run a particular set of CPUs]. > > Any suggestions? This is an interesting question. First, keep in mind that the thread structure and all TLS must be accessible from all threads of the process. These objects have addresses which can be taken and passed around, and the thread structure will be touched by other threads for things like cancellation, linking/unlinking new/exiting threads from the thread list, joining, etc. So whatever you do it needs to preserve accessibility and just tweak what's efficient. Ideally the scheduler on a NUMA kernel would do this for you based on access patterns or such. Now, on to "how you'd do it": At first I thought pthread_getattr_np would give you the info via stack size, but no, it's only the actual stack, not the TLS or thread structure area. Normally these areas are contiguous with the stack, unless you manually allocated a stack with pthread_attr_setstack, in which case pthread_create will allocate separate space for the thread structure and TLS unless they're under both a certain absolute size and a certain percentage of the stack size (basically, to guarantee that large TLS doesn't leave you with a significantly smaller stack than you expected). So for now, if you're not doing custom stacks, TLS and the thread structure will be in the same mapping as the stack (you could find its extents via /proc/self/maps or something). Now, that's probably a bad idea to rely on, because at some point we might add an extra guard page between the stack and the TLS/thread structure for hardening, so that stack-based overflows can't clobber TLS or thread structure. It's also not true for the main thread, where TLS and thread structure will be in .bss or mmap-allocated memory (depending on size) separate from the main thread's stack. One dumb idea would be taking &errno and looking for the map in /proc/self/maps that contains it. This would cover all static TLS and the thread structure, since ABI constrains them to be contiguous. It won't cover dynamic TLS. Another idea would be calling __tls_get_addr for each module, using the module id's provided by dl_iterate_phdr. This will be offset by an arch-dependent adjustment you need to be aware of, however. It looks like the dlpi_tls_data field of the dl_iterate_phdr callback structure is also supposed to contain a pointer to the calling thread's TLS region for the module (pointer to beginning? end?), but we actually seem to have this wrong in musl right now -- we're giving a pointer to the module's TLS image used for instantiating new threads' TLS. Note that you can also obtain the size from dl_iterate_phdr by using the PT_TLS program header. Now, this is going to be a lot less useful than you'd think, because dynamic TLS tends to be small and is allocated by malloc, not mmap, so it won't be page-aligned or per-thread. In fact, right now, it's allocated contiguously *by library*, not *by thread*, which is pretty awful for NUMA. Fortunately this only applies to threads that already existed when dlopen was called; new threads get all existing TLS allocated as if it were static. Since 1.1.22 changed how dynamic TLS is installed, I do intent do change the point and strategy of allocation, and I'll keep NUMA in mind when I do (i.e. allocate each thread's new DTLS as a unit rather than allocating each library's). Rich