From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12836 Path: news.gmane.org!.POSTED!not-for-mail From: Szabolcs Nagy Newsgroups: gmane.linux.lib.musl.general Subject: Re: TLS issue on aarch64 Date: Fri, 25 May 2018 16:50:59 +0200 Message-ID: <20180525145059.GG4418@port70.net> References: Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1527259749 26441 195.159.176.226 (25 May 2018 14:49:09 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 25 May 2018 14:49:09 +0000 (UTC) User-Agent: Mutt/1.9.1 (2017-09-22) To: musl@lists.openwall.com Original-X-From: musl-return-12852-gllmg-musl=m.gmane.org@lists.openwall.com Fri May 25 16:49:05 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1fME1f-0006jQ-Ig for gllmg-musl@m.gmane.org; Fri, 25 May 2018 16:49:03 +0200 Original-Received: (qmail 28572 invoked by uid 550); 25 May 2018 14:51:12 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 28528 invoked from network); 25 May 2018 14:51:11 -0000 Mail-Followup-To: musl@lists.openwall.com Content-Disposition: inline In-Reply-To: Xref: news.gmane.org gmane.linux.lib.musl.general:12836 Archived-At: * Phillip Berndt [2018-05-25 14:40:14 +0200]: > I'm experiencing a TLS-related error with musl on aarch64. This is my > test program: > > ----8<-------------- > #include > > __thread int foo = 1234; > __thread int bar __attribute__((aligned(0x100))) = 5678; > > int main(int argc, char **argv) > { > printf("0x%p: %d\n", &foo, foo); > printf("0x%p: %d\n", &bar, bar); > > return 0; > } > ---->8--------------- > > I'm compiling this into a static binary with -O2 -static. With glibc and > gcc 5.4.0 (tried with 7.2 as well), this gives me the expected output. With > musl libc 1.1.19, I instead see > > ----8<-------------- > 0x0x7fa7adf2f0: 0 > 0x0x7fa7adf3f0: 0 > ---->8--------------- > > Note that this is the wrong address (not aligned), and that the memory has > unexpected content as well. > > I did some initial debugging, but now I'm stuck and need some help. What I've > found so far: > > * GCC apparently emits code that expects the tpidr_el0 register to contain a > pointer to the TLS memory, and it expects that the loader unconditionally > offsets the first variable by the TLS alignment into said memory: > > Disassembly of the code that loads &foo: > ----8<-------------- > 4001a4: d53bd053 mrs x19, tpidr_el0 > 4001a8: 91400273 add x19, x19, #0x0, lsl #12 > 4001ac: 91040273 add x19, x19, #0x100 > ----8<-------------- > > (If I align the variable by 0x1000 instead then the code changes > acoordingly.) > > * Musl, on the other hand, in __copy_tls, initializes tpidr_el0 with a > pointer 16 bytes from the end of struct pthread, and copies the TLS > initializer code directly behind that struct, without adding extra > padding. > yeah i think on aarch64 (which is TLS_ABOVE_TP) musl expects a layout -------------------------- ----------- - - | struct pthread | 16 | tlsvar -------------------------- ----------- - - ^ ^ <-----> self tp tls offset (aligned) while it should be -------------------------- ----------- - - | struct pthread | 16 | tlsvar -------------------------- ----------- - - ^ ^<--------------> self tp tls offset (aligned) i think the constraints for tp are: - tp must be aligned to 'tls_align' - tp must be at a small fixed offset from the end of pthread struct (so asm code can access the dtv) - tp + off must be usable memory for tls for off >= 16 (this is aarch64 specific) i'm not yet sure what's the best fix. > Hence the code tries to access the TLS variables at the wrong location. > > The following patch fixes the issue, but only if musl is then compiled with > optimizations off. With optimizations, the compiler emits the *same* code for > both variants. Also, the patch probably has some unexpected side-effects, too - > I'm just adding it here as a starting point for further debugging. > > Any help is greatly appreciated :-) > > - Phillip > > ---- > > diff --git a/arch/aarch64/pthread_arch.h b/arch/aarch64/pthread_arch.h > index b2e2d8f..c69f6f1 100644 > --- a/arch/aarch64/pthread_arch.h > +++ b/arch/aarch64/pthread_arch.h > @@ -2,10 +2,10 @@ static inline struct pthread *__pthread_self() > { > char *self; > __asm__ __volatile__ ("mrs %0,tpidr_el0" : "=r"(self)); > - return (void*)(self + 16 - sizeof(struct pthread)); > + return (void*)(self - sizeof(struct pthread)); > } > > #define TLS_ABOVE_TP > -#define TP_ADJ(p) ((char *)(p) + sizeof(struct pthread) - 16) > +#define TP_ADJ(p) ((char *)(p) + sizeof(struct pthread)) > this is ok, but wastes 16 bytes after the pthread struct where will be no tls data (and i guess the allocated tls size should be adjusted accordingly as well as tlsdesc.s). > #define MC_PC pc > diff --git a/src/env/__init_tls.c b/src/env/__init_tls.c > index b125eb1..3a3c307 100644 > --- a/src/env/__init_tls.c > +++ b/src/env/__init_tls.c > @@ -42,7 +42,7 @@ void *__copy_tls(unsigned char *mem) > > mem += -((uintptr_t)mem + sizeof(struct pthread)) & (libc.tls_align-1); > td = (pthread_t)mem; > - mem += sizeof(struct pthread); > + mem += sizeof(struct pthread) + libc.tls_align; > i don't think this works with the above TP_ADJ, but yes something has to be done differently here to make aarch64 happy. > for (i=1, p=libc.tls_head; p; i++, p=p->next) { > dtv[i] = mem + p->offset;