From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12842 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: TLS issue on aarch64 Date: Sat, 26 May 2018 20:17:06 -0400 Message-ID: <20180527001706.GF1392@brightrain.aerifal.cx> References: <20180525145059.GG4418@port70.net> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1527380117 30550 195.159.176.226 (27 May 2018 00:15:17 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 27 May 2018 00:15:17 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-12858-gllmg-musl=m.gmane.org@lists.openwall.com Sun May 27 02:15:12 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1fMjL6-0007qw-Pg for gllmg-musl@m.gmane.org; Sun, 27 May 2018 02:15:12 +0200 Original-Received: (qmail 9965 invoked by uid 550); 27 May 2018 00:17:19 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 9944 invoked from network); 27 May 2018 00:17:19 -0000 Content-Disposition: inline In-Reply-To: <20180525145059.GG4418@port70.net> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:12842 Archived-At: On Fri, May 25, 2018 at 04:50:59PM +0200, Szabolcs Nagy wrote: > > yeah i think on aarch64 (which is TLS_ABOVE_TP) musl expects a layout > > -------------------------- ----------- - - > | struct pthread | 16 | tlsvar > -------------------------- ----------- - - > ^ ^ <-----> > self tp tls offset > (aligned) I don't think this is correct; if so it's a bug. The intent is only that musl expects the point where TLS begins (the beginning of your tlsvar area above) to be aligned, tp to point 16 below it, and self to point sizeof(struct pthread) below that. > while it should be > > -------------------------- ----------- - - > | struct pthread | 16 | tlsvar > -------------------------- ----------- - - > ^ ^<--------------> > self tp tls offset > (aligned) The "tls offset" described here (including the 16 reserved bytes) doesn't seem to match the "tls offset" in the sense of offset from the beginning of the PT_TLS image. That seems to be the core issue here. > i think the constraints for tp are: > > - tp must be aligned to 'tls_align' I don't see this as an explicit requirement anywhere, but it falls out implicitly if the linker assigns offsets beginning at max(16,tls_align) (and adjusted by that much vs the PT_TLS segment). > - tp must be at a small fixed offset from the end > of pthread struct (so asm code can access the dtv) > > - tp + off must be usable memory for tls for off >= 16 > (this is aarch64 specific) The constraint is not "usable memory" but that it lines up with where we're pasting the PT_TLS image. > i'm not yet sure what's the best fix. > > > Hence the code tries to access the TLS variables at the wrong location. > > > > The following patch fixes the issue, but only if musl is then compiled with > > optimizations off. With optimizations, the compiler emits the *same* code for > > both variants. Also, the patch probably has some unexpected side-effects, too - > > I'm just adding it here as a starting point for further debugging. > > > > Any help is greatly appreciated :-) > > > > - Phillip > > > > ---- > > > > diff --git a/arch/aarch64/pthread_arch.h b/arch/aarch64/pthread_arch.h > > index b2e2d8f..c69f6f1 100644 > > --- a/arch/aarch64/pthread_arch.h > > +++ b/arch/aarch64/pthread_arch.h > > @@ -2,10 +2,10 @@ static inline struct pthread *__pthread_self() > > { > > char *self; > > __asm__ __volatile__ ("mrs %0,tpidr_el0" : "=r"(self)); > > - return (void*)(self + 16 - sizeof(struct pthread)); > > + return (void*)(self - sizeof(struct pthread)); > > } > > > > #define TLS_ABOVE_TP > > -#define TP_ADJ(p) ((char *)(p) + sizeof(struct pthread) - 16) > > +#define TP_ADJ(p) ((char *)(p) + sizeof(struct pthread)) > > > > this is ok, but wastes 16 bytes after the pthread struct > where will be no tls data (and i guess the allocated tls > size should be adjusted accordingly as well as tlsdesc.s). > > > #define MC_PC pc > > diff --git a/src/env/__init_tls.c b/src/env/__init_tls.c > > index b125eb1..3a3c307 100644 > > --- a/src/env/__init_tls.c > > +++ b/src/env/__init_tls.c > > @@ -42,7 +42,7 @@ void *__copy_tls(unsigned char *mem) > > > > mem += -((uintptr_t)mem + sizeof(struct pthread)) & (libc.tls_align-1); > > td = (pthread_t)mem; > > - mem += sizeof(struct pthread); > > + mem += sizeof(struct pthread) + libc.tls_align; > > > > i don't think this works with the above TP_ADJ, but > yes something has to be done differently here to make > aarch64 happy. As noted later in the thread, the issue does not seem to be aarch64-specific, but rather applies whenever the TLS ABI has some reserved space between the thread pointer and beginning of TLS (nonzero offset in TP_ADJ? not exactly since on mips, etc. the offset is not reserved space but just accounting for ridiculous signed 16-bit offsets). Rich